You are on page 1of 4

Third normal form

From Wikipedia, the free encyclopedia

Third normal form is a normal form that is used in normalizing a database


design to reduce the duplication of data and ensure referential integrity by
ensuring that (1) the entity is in second normal form, and (2) all the attributes in a
table are determined only by the candidate keys of that table and not by any
non-prime attributes. 3NF was designed to improve database processing while
minimizing storage costs. 3NF data modeling was ideal for online transaction
processing (OLTP) applications with heavy order entry type of needs. [1]

Definition of third normal form


The third normal form (3NF) is a normal form used in database normalization.
3NF was originally defined by E.F. Codd in 1971.[2] Codd's definition states that a
table is in 3NF if and only if both of the following conditions hold:

The relation R (table) is in second normal form (2NF)


Every non-prime attribute of R is non-transitively dependent on every key of
R.

A non-prime attribute of R is an attribute that does not belong to any candidate


key of R.[3] A transitive dependency is a functional dependency in which X → Z (X
determines Z) indirectly, by virtue of X → Y and Y → Z (where it is not the case that
Y → X).[4]

A 3NF definition that is equivalent to Codd's, but expressed differently, was given
by Carlo Zaniolo in 1982. This definition states that a table is in 3NF if and only if,
for each of its functional dependencies X → A, at least one of the following
conditions holds:

X contains A (that is, X → A is trivial functional dependency), or


X is a superkey, or
Every element of A-X, the set difference between A and X, is a prime attribute
(i.e., each attribute in A-X is contained in some candidate key)[5][6]

Zaniolo's definition gives a clear sense of the difference between 3NF and the
more stringent Boyce–Codd normal form (BCNF). BCNF simply eliminates the
third alternative ("Every element of A-X, the set difference between A and X, is a
prime attribute").
"Nothing but the key"
An approximation of Codd's definition of 3NF, paralleling the traditional pledge to
give true evidence in a court of law, was given by Bill Kent: "[Every] non-key
[attribute] must provide a fact about the key, the whole key, and nothing but the
key."[7]

Requiring existence of "the key" ensures that the table is in 1NF; requiring that
non-key attributes be dependent on "the whole key" ensures 2NF; further
requiring that non-key attributes be dependent on "nothing but the key" ensures
3NF. While this phrase is a useful mnemonic, the fact that it only mentions a
single key means it defines some necessary but not sufficient conditions to satisfy
the 2nd and 3rd Normal Forms. Both 2NF and 3NF are concerned equally with all
candidate keys of a table and not just any one key.

Chris Date refers to Kent's summary as "an intuitively attractive characterization"


of 3NF, and notes that with slight adaptation it may serve as a definition of the
slightly stronger Boyce–Codd normal form: "Each attribute must represent a fact
about the key, the whole key, and nothing but the key." [8] The 3NF version of the
definition is weaker than Date's BCNF variation, as the former is concerned only
with ensuring that non-key attributes are dependent on keys. Prime attributes
(which are keys or parts of keys) must not be functionally dependent at all; they
each represent a fact about the key in the sense of providing part or all of the key
itself. (It should be noted here that this rule applies only to functionally dependent
attributes, as applying it to all attributes would implicitly prohibit composite
candidate keys, since each part of any such key would violate the "whole key"
clause.)

An example of a 2NF table that fails to meet the requirements of 3NF is:

Tournament Winners
Tournament Year Winner Winner Date of Birth
Indiana Invitational 1998 Al Fredrickson 21 July 1975
Cleveland Open 1999 Bob Albertson 28 September 1968
Des Moines Masters 1999 Al Fredrickson 21 July 1975
Indiana Invitational 1999 Chip Masterson 14 March 1977

Because each row in the table needs to tell us who won a particular Tournament in
a particular Year, the composite key {Tournament, Year} is a minimal set of
attributes guaranteed to uniquely identify a row. That is, {Tournament, Year} is a
candidate key for the table.

The breach of 3NF occurs because the non-prime attribute Winner Date of Birth is
transitively dependent on the candidate key {Tournament, Year} via the non-prime
attribute Winner. The fact that Winner Date of Birth is functionally dependent on
Winner makes the table vulnerable to logical inconsistencies, as there is nothing
to stop the same person from being shown with different dates of birth on
different records.

In order to express the same facts without violating 3NF, it is necessary to split
the table into two:

Tournament Winners Winner Dates of Birth


Tournament Year Winner Winner Date of Birth
Indiana Invitational 1998 Al Fredrickson Chip Masterson 14 March 1977
Cleveland Open 1999 Bob Albertson Al Fredrickson 21 July 1975
Des Moines Masters 1999 Al Fredrickson Bob Albertson 28 September 1968
Indiana Invitational 1999 Chip Masterson

Update anomalies cannot occur in these tables.

Derivation of Zaniolo's conditions


The definition of 3NF offered by Carlo Zaniolo in 1982, and given above, is proven
in the following way: Let X → A be a nontrivial FD (i.e. one where X does not
contain A) and let A be a non-key attribute. Also let Y be a key of R. Then Y → X.

Normalization beyond 3NF


Most 3NF tables are free of update, insertion, and deletion anomalies. Certain
types of 3NF tables, rarely met with in practice, are affected by such anomalies;
these are tables which either fall short of Boyce–Codd normal form (BCNF) or, if
they meet BCNF, fall short of the higher normal forms 4NF or 5NF.

Considerations for Use in Reporting


Environments
While 3NF was ideal for machine processing, the segmented nature of the data
model was difficult to consume by a human user. Analytics via query, reporting,
and dashboards required a different type of data model that supported analysis
such as trend lines, period-to-date calculations (month-to-date, quarter-to-date,
year-to-date), cumulative calculations, basic statistics (average, standard
deviation, moving averages) and previous period comparisons (year ago, month
ago, week ago) e.g. dimensional modeling and beyond dimensional modeling,
flattening of stars via Hadoop and data science.[9][10]

See also
Attribute-value system

References
1. "What is Third Normal Form?" (http://www.techopedia.com/definition/22561/third-
normal-form-3nf) Cory Janssen, Technopedia, retrieved 24 April 2014
2. Codd, E.F. "Further Normalization of the Data Base Relational Model." (Presented at
Courant Computer Science Symposia Series 6, "Data Base Systems," New York City,
May 24th–25th, 1971.) IBM Research Report RJ909 (August 31st, 1971).
Republished in Randall J. Rustin (ed.), Data Base Systems: Courant Computer
Science Symposia Series 6. Prentice-Hall, 1972.
3. Codd, p. 43.
4. Codd, p. 45–46.
5. Zaniolo, Carlo. "A New Normal Form for the Design of Relational Database
Schemata." ACM Transactions on Database Systems 7(3), September 1982.
6. Abraham Silberschatz, Henry F. Korth, S. Sudarshan, Database System Concepts
(http://www.db-book.com/) (5th edition), p. 276-277
7. Kent, William. "A Simple Guide to Five Normal Forms in Relational Database
Theory" (http://www.bkent.net/Doc/simple5.htm), Communications of the ACM 26
(2), Feb. 1983, pp. 120–125.
8. Date, C.J. An Introduction to Database Systems (7th ed.) (Addison Wesley, 2000), p.
379.
9. [1] (http://roelantvos.com/blog/?p=740).
10. [2] (https://infocus.emc.com/william_schmarzo/hadoop-data-modeling-lessons-
vin-diesel/).

Further reading
Date, C. J. (1999), An Introduction to Database Systems (http://www.aw-bc.com
/catalog/academic/product/0,1144,0321197844,00.html) (8th ed.). Addison-Wesley
Longman. ISBN 0-321-19784-4.
Kent, W. (1983) A Simple Guide to Five Normal Forms in Relational Database Theory
(http://www.bkent.net/Doc/simple5.htm), Communications of the ACM, vol. 26,
pp. 120–126

You might also like