You are on page 1of 42

These slides are for use with

Database Systems
Concepts, Languages and Architectures
Paolo Atzeni Stefano Ceri Stefano Paraboschi Riccardo Torlone McGraw-Hill 1999

To view these slides on-screen or with a projector use the arrow keys to move to the next or previous slide. The return or enter key will also take you to the next slide. Note you can press the escape key to reveal the menu bar and then use the standard Acrobat controls including the magnifying glass to zoom in on details.

To print these slides on acetates for projection use the escape key to reveal the menu and choose print from the file menu. If the slides are too large for your printer then select shrink to fit in the print dialogue box. Press the return or enter key to continue . . .

Chapter 8 Normalization

Click here for help

Database Systems Chapter 8: Normalization

Normal form and normalization


A normal form is a property of a relational database. When a relation is non-normalized (that is, does not satisfy a normal form), then it presents redundancies and produces undesirable behavior during update operations. This principle can be used to carry out quality analysis and constitutes a useful tool for database design. Normalization is a procedure that allows the non-normalized schemas to be transformed into new schemas for which the satisfaction of a normal form is guaranteed.

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

Example of a relation with anomalies


Employee Brown Green Green Hoskins Hoskins Hoskins Moore Moore Kemp Kemp Salary 20 35 35 55 55 55 48 48 48 48 Project Mars Jupiter Venus Venus Jupiter Mars Mars Venus Venus Jupiter Budget 2 15 15 15 15 2 2 15 15 15 Function technician designer designer manager consultant consultant manager designer designer manager

The key is made up of the attributes Employee and Project

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

Anomalies in the example relation


The value of the salary of each employee is repeated in all the tuples relating to it: therefore there is a redundancy. If the salary of an employee changes, we have to modify the value in all the corresponding tuples. This problem is known as the update anomaly. If an employee stops working on all the projects but does not leave the company, all the corresponding tuples are deleted and so, even the basic information, name and salary is lost. This problem is known as the deletion anomaly. If we have information on a new employee, we cannot insert it until the employee is assigned to a project. This is known as the insertion anomaly.

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

Why these undesirable phenomena?


Intuitive explanation: we have used a single relation to represent items of information of different types. In particular, the following independent real-world concepts are represented in the relation: employees with their salaries, projects with their budgets, participation of the employees in the projects with their functions. To systematically study the principles introduced informally, it is necessary to use a specific notion: the functional dependency.

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

Functional dependencies
Given a relation r on a schema R(X) and two non-empty subsets Y and Z of the attributes X, we say that there is a functional dependency on r between Y and Z, if, for each pair of tuples t1 and t2 of r having the same values on the attributes Y, t1 and t2 also have the same values of the Z attributes. A functional dependency between the attributes Y and Z is indicated by the notation Y Z.

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

Functional dependencies in the example schema


Employee Salary the salary of each employee is unique and thus each time a certain employee appears in a tuple, the value of his or her salary always remains the same. Project Budget the budget of each project is unique and thus each time a certain project appears in a tuple, the value of its budget always remains the same.

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

Non-trivial functional dependencies


We then say that a functional dependency Y Z is non-trivial if no attribute in Z appears among the attributes of Y. Employee Salary is a non-trivial functional dependency Employee Project Project is a trivial functional dependency

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

Anomalies and functional dependencies


In our example, the two properties causing anomalies correspond exactly to attributes involved in functional dependencies: the property salary of each employee is unique and depends the only on the employeecorresponds to the functional dependency Employee Salary; the property budget of each project is unique and depends the only on the projectcorresponds to the functional dependency Project Budget. Moreover, the following property can be formalized by means of a functional dependency: the property each project, each of the employees involved can in carry out only one functioncorresponds to the functional dependency Employee Project Function.
McGraw-Hill 1999

Database Systems Chapter 8: Normalization

Dependencies generating anomalies


The first two dependencies generate undesirable redundancies and anomalies. The third dependency however never generates redundancies because, having Employee and Project as a key, the relation cannot contain two tuples with the same values of these attributes. The difference is that Employee Project is a key of the relation.

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

BoyceCodd Normal Form (BCNF)

A relation r is in BoyceCodd normal form if for every (non-trivial) functional dependency X Y defined on it, X contains a key K of r. That is, X is a superkey for r. Anomalies and redundancies, as discussed above, do not appear in databases with relations in BoyceCodd normal form, because the independent pieces of information are separate, one per relation.

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

Decomposition into BoyceCodd normal form


Given a relation that does not satisfy BoyceCodd normal form, we can often replace it with one or more normalized relations using a process called normalization. We can eliminate redundancies and anomalies for the example relation if we replace it with the three relations, obtained by projections on the sets of attributes corresponding to the three functional dependencies. The keys of the relations we obtain are the left hand side of a functional dependency: the satisfaction of the BoyceCodd normal form is therefore guaranteed.

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

Decomposition of the example relation


Employee Brown Green Hoskins Moore Kemp Salary 20 35 55 48 48

Project Mars Jupiter Venus

Budget 2 15 15

Employee Brown Green Green Hoskins Hoskins Hoskins Moore Moore Kemp Kemp

Project Mars Jupiter Venus Venus Jupiter Mars Mars Venus Venus Jupiter

Function technician designer designer manager consultant consultant manager designer designer manager
McGraw-Hill 1999

Database Systems Chapter 8: Normalization

A relation to be decomposed
Employee Brown Green Green Hoskins Hoskins Project Mars Jupiter Venus Saturn Venus Branch Chicago Birmingham Birmingham Birmingham Birmingham

The relation satisfies the functional dependencies: Employee Branch Project Branch

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

A possible decomposition of the previous relation

Employee Brown Green Hoskins

Branch Chicago Birmingham Birmingham

Project Mars Jupiter Saturn Venus

Branch Chicago Birmingham Birmingham Birmingham

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

The join of the projections


Employee Brown Green Green Hoskins Hoskins Green Hoskins Project Mars Jupiter Venus Saturn Venus Saturn Jupiter Branch Chicago Birmingham Birmingham Birmingham Birmingham Birmingham Birmingham

The result is different from the original relation: the information can not be reconstructed.

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

Lossless decomposition
The decomposition of a relation r on X1 and X2 is lossless if the join of the projections of r on X1 and X2 is equal to r itself (that is, not containing spurious tuples). It is clearly desirable, or rather an indispensable requirement, that a decomposition carried out for the purpose of normalization is lossless.

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

A condition for the lossless decomposition


Let r be a relation on X and let X1 and X2 be two subsets of X such that X1 X2 = X. Furthermore, let X0 = X1 X2. If r satisfies the functional dependency X0 X1 or the functional dependency X0 X2, then the decomposition of r on X1 and X2 is lossless.

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

A lossless decomposition of the previous relation

Employee Brown Green Hoskins

Branch Chicago Birmingham Birmingham

Employee Brown Green Green Hoskins Hoskins

Project Mars Jupiter Venus Saturn Venus

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

Another problem with the new decomposition


Assume we wish to insert a new tuple that specifies the participation of the employee named Armstrong, who works in Birmingham, on the Mars project. In the original relation an this update would be immediately identified as illegal, because it would cause a violation of the Project Branch dependency. On the decomposed relations however, it is not possible to reveal any violation of dependency since the two attributes Project and Branch have been separated: one into one relation and one into the other.

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

Preservation of dependencies
A decomposition preserves the dependencies if each of the functional dependencies of the original schema involves attributes that appear all together in one of the decomposed schemas. It is clearly desirable that a decomposition preserves the dependencies since, in this way, it is possible to ensure, on the decomposed schema, the satisfaction of the same constraints as the original schema.

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

Qualities of decompositions
Decompositions should always satisfy the properties of lossless decomposition and dependency preservation: Lossless decomposition ensures that the information in the original relation can be accurately reconstructed based on the information represented in the decomposed relations. Dependency preservation ensures that the decomposed relations have the same capacity to represent the integrity constraints as the original relations and thus to reveal illegal updates.

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

A relation not satisfying the BCNF


Manager Brown Green Green Hoskins Hoskins Project Mars Jupiter Mars Saturn Venus Branch Chicago Birmingham Birmingham Birmingham Birmingham

Assume that the following dependencies are defined: Manager Branch: each manager works at a particular branch; Project Branch Manager: each project has more managers who are responsible for it, but in different branches, and each manager can be responsible for more than one project; however, for each branch, a project has only one manager responsible for it.

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

A problematic decomposition
The relation is not in BoyceCodd normal form because the left hand side of the first dependency is not a superkey. At the same time, no good decomposition of this relation is possible: the dependency Project Branch Manager involves all the attributes and thus no decomposition is able to preserve it. We can therefore state that sometimes, BoyceCodd normal form cannot be achieved.

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

A new normal form


A relation r is in third normal form if, for each (non-trivial) functional dependency X Y defined on it, at least one of the following is verified: X contains a key K of r; each attribute in Y is contained in at least one key of r.

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

BCNF and third normal form


The previous schema does not satisfy the BoyceCodd normal form, but it satisfies the third normal form: The Project Branch Manager dependency has as its left hand side a key for the relation, while Manager Branch has a unique attribute for the right hand side, which is part of the Project Branch key. The third normal form is less restrictive than the BoyceCodd normal form and for this reason does not offer the same guarantees of quality for a relation; it has the advantage however, of always being achievable.

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

Decomposition into third normal form


Decomposition into third normal form can proceed as suggested for the BoyceCodd normal form: a relation that does not satisfy the third normal form is decomposed into relations obtained by projections on the attributes corresponding to the functional dependencies. The only condition to guarantee in this process is of always maintaining a relation that contains a key to the original relation.

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

A restructuring of the previous relation


Manager Brown Green Green Hoskins Hoskins Project Mars Jupiter Mars Saturn Venus Branch Chicago Birmingham Birmingham Birmingham Birmingham Division 1 1 1 2 2

Functional dependencies: Manager Branch Division: each manager works at one branch and manages one division; Branch Division Manager: for each branch and division there is a single manager; Project Branch Division: for each branch, a project is allocated to a single division and has a sole manager responsible.
McGraw-Hill 1999

Database Systems Chapter 8: Normalization

A good decomposition of the restructured schema


Project Mars Jupiter Mars Saturn Venus Branch Chicago Birmingham Birmingham Birmingham Birmingham Division 1 1 1 2 2

Manager Brown Green Hoskins

Branch Chicago Birmingham Birmingham

Division 1 1 2

The decomposition is lossless and the dependencies are preserved. This example shows that often the difficulty of achieving BoyceCodd normal form could be due to an insufficiently accurate analysis of the application.

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

Database design and normalization


The theory of normalization can be used as a basis for quality control operations on schemas, in both the conceptual and logical design phases: the analysis of the relations obtained during the logical design phase can identify places where the conceptual design was inaccurate: this verification of the design is often relatively easy; the ideas on which normalization is based can also be used during the conceptual design phase for the quality control of each element of the conceptual schema.

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

An entity to undergo a verification of normalization

Code

PRODUCT

ProductName Address SupplierCode Supplier

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

Analysis of the entity


The attribute Code constitutes the identifier of the entity. The functional dependency SupplierCode Supplier Address is verified on the attributes of the entity: all the properties of each supplier are identified by its SupplierCode. The entity violates the third normal form since this dependency has a left hand side that does not contain the identifier and a right hand side made up of attributes that are not part of the key.

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

The result of the decomposition of an entity

Code

PRODUCT
ProductName

(1,1)

SUPPLY

(0,N)

SUPPLIER

SupplierCode Address Name

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

A relationship for which normalization is to be verified


DEPARTMENT
(0,N)

PROFESSOR

(0,N)

THESIS
(0,N)

(0,1)

STUDENT

PROGRAMME

DEGREE

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

Analysis of the relationship


The following functional dependencies can be identified: STUDENT DEGREEPROGRAMME STUDENT PROFESSOR PROFESSOR DEPARTMENT The (unique) key of the relationship is STUDENT. Therefore, the third functional dependency causes a violation of the third normal form.

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

The result of the decomposition of a relationship

PROFESSOR
(1,1)

(0,N)

THESIS
(0,N)

(0,1)

STUDENT

AFFILIATION
(0,N)

PROGRAMME

DEGREE

DEPARTMENT

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

Further observations on the decomposed schema


The relationship THESIS is in third normal form, because its key is made up of the STUDENT entity, and the only dependencies that exist on it are those that have this entity as left hand side. On the other hand, the properties described by the two dependencies are independent of each other: not all students are writing theses and so not all of them have supervisors. From the normalization point of view, this situation does not present problems. However, at the conceptual modelling level, we must distinguish among the various concepts. We can therefore conclude that it would be appropriate to decompose the relationship further, obtaining two relationships, one for each of the two concepts.
McGraw-Hill 1999

Database Systems Chapter 8: Normalization

The result of a further decomposition of a relationship


(0,N) (0,1)

PROFESSOR
(1,1)

THESIS

STUDENT
(1,1)

AFFILIATION
(0,N)

REGISTRATION
(0,N)

DEPARTMENT

PROGRAME

DEGREE

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

A relationship that is difficult to decompose

BRANCH

(0,N)

ASSIGNMENT
(0,N)

(0,N)

MANAGER

PROJECT

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

A restructuring of the previous schema

MANAGER

(1,1)

MANAGEMENT
(1,1)

Code

BRANCH

(0,N)

COMPOSITION

(1,1)

DIVISION
(1,N)

PROJECT

(0,N)

ASSIGNMENT

McGraw-Hill 1999

Database Systems Chapter 8: Normalization

A relationship whose normalization is to be verified

TEAM
(1,N)

COACH

(0,1)

COMPOSITION
(1,N)

(0,1)

PLAYER

CITY

McGraw-Hill 1999

You might also like