Database normalization explained

Database normalization is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency.
Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships between them. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database using the defined relationships.
Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know as the First Normal Form (1NF) in 1970. Codd went on to define the Second Normal Form (2NF) and Third Normal Form (3NF) in 1971, and Codd and Raymond F. Boyce defined the Boyce-Codd Normal Form (BCNF) in 1974. Informally, a relational database table is often described as "normalized" if it is in the Third Normal Form. Most 3NF tables are free of insertion, update, and deletion anomalies.
Objectives of normalization
A basic objective of the first normal form defined by Edgar Frank "Ted" Codd in 1970 was to permit data to be queried and manipulated using a "universal data sub-language" grounded in first-order logic. (SQL is an example of such a data sub-language, albeit one that Codd regarded as seriously flawed.) The objectives of normalization beyond 1NF (First Normal Form) were stated as follows by Codd:
1. To free the collection of relations from undesirable insertion, update and deletion dependencies; 2. To reduce the need for restructuring the collection of relations, as new types of data are introduced, and thus increase the life span of application programs; 3. To make the relational model more informative to users; 4. To make the collection of relations neutral to the query statistics, where these statistics are liable to change as time goes by. Functional dependency In a given table, an attribute Y is said to have a functional dependency on a set of attributes X (written X Y) if and only if each X value is associated with precisely one Y value. For example, in an "Employee" table that includes the attributes "Employee ID" and "Employee Date of Birth", the functional dependency {Employee ID} {Employee Date of Birth} would hold.
It follows from the previous two sentences that each {Employee ID} is associated with precisely one {Employee Date of Birth}. Trivial functional dependency A trivial functional dependency is a functional dependency of an attribute on a superset of itself. {Employee ID, Employee Address} {Employee Address} is trivial, as is {Employee Address} {Employee Address}. Full functional dependency An attribute is fully functionally dependent on a set of attributes X if it is: functionally dependent on X, and not functionally dependent on any proper subset of X. {Employee Address} has a functional dependency on {Employee ID, Skill}, but not a full functional dependency, because it is also dependent on {Employee ID}.Even by the removal of {Skill} functional dependency still holds between {Employee Address} and {Employee ID}. Transitive dependency A transitive dependency is an indirect functional dependency, one in which XZ only by virtue of XY and YZ. Multivalued dependency A multivalued dependency is a constraint according to which the presence of certain rows in a table implies the presence of certain other rows. Join dependency A table T is subject to a join dependency if T can always be recreated by joining multiple tables each having a subset of the attributes of T. Superkey A superkey is a combination of attributes that can be used to uniquely identify a database record. A table might have many superkeys. Candidate key A candidate key is a special subset of superkeys that do not have any extraneous information in them: it is a minimal superkey. Example: A table with the fields <Name>, <Age>, <SSN> and <Phone Extension> has many possible superkeys. Three of these are <SSN>, <Phone Extension, Name> and <SSN, Name>. Of those, only <SSN> is a candidate key as the others contain information not necessary to uniquely identify records ('SSN' here refers to Social Security Number, which is unique to each person).
Non-prime attribute A non-prime attribute is an attribute that does not occur in any candidate key. Employee Address would be a non-prime attribute in the "Employees' Skills" table. Prime attribute A prime attribute, conversely, is an attribute that does occur in some candidate key. Primary key One candidate key in a relation may be designated the primary key. While that may be a common practice (or even a required one in some environments), it is strictly notational and has no bearing on normalization. With respect to normalization, all candidate keys have equal standing and are treated the same.
Normal forms
The normal forms (abbrev. NF) of relational database theory provide criteria for determining a table's degree of immunity against logical inconsistencies and anomalies. The higher the normal form applicable to a table, the less vulnerable it is. Each table has a "highest normal form" (HNF): by definition, a table always meets the requirements of its HNF and of all normal forms lower than its HNF; also by definition, a table fails to meet the requirements of any normal form higher than its HNF. The normal forms are applicable to individual tables; to say that an entire database is in normal form n is to say that all of its tables are in normal form n. Newcomers to database design sometimes suppose that normalization proceeds in an iterative fashion, i.e. a 1NF design is first normalized to 2NF, then to 3NF, and so on. This is not an accurate description of how normalization typically works. A sensibly designed table is likely to be in 3NF on the first attempt; furthermore, if it is 3NF, it is overwhelmingly likely to have an HNF of 5NF. Achieving the "higher" normal forms (above 3NF) does not usually require an extra expenditure of effort on the part of the designer, because 3NF tables usually need no modification to meet the requirements of these higher normal forms. The main normal forms are summarized below.
Normal form
Defined by
In
Brief definition
First 1NF normal form
Two versions: E.F. Codd(1970), C.J. Date (2003)
A relation is in first normal form if the domain of each attribute contains [1] 1970 and only atomic values, and the value of each [9] 2003 attribute contains only a single value from [10] that domain.
Second 2NF normal form
E.F. Codd
1971
[2]
No non-prime attribute in the table is functionally dependent on a proper subset of any candidate key
Third 3NF normal form
Two versions: E.F. Codd(1971), C. Zaniolo (1982)
Every non-prime attribute is nontransitively dependent on every candidate key in the table. The attributes that do not [2] 1971 and contribute to the description of the primary [11] 1982 key are removed from the table. In other words, no transitive dependency is allowed.
(1) In relational database design, the process of organizing data to minimize redundancy. (2) Normalization usually involves dividing a database into two or more tables and defining relationships between the tables. (3) The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships. There are three main normal forms, each with increasing levels of normalization: First Normal Form (1NF): Each field in a table contains different information. For example, in an employee list, each table would contain only one birthdate field. Second Normal Form (2NF): Each field in a table that is not a determiner of the contents of another field must itself be a function of the other fields in the table.
Third Normal Form (3NF): No duplicate information is permitted. So, for example, if two tables both require a birthdate field, the birthdate information would be separated into a separate table, and the two other tables would then access the birthdate information via an indexfield in the birthdate table. Any change to a birthdate would automatically be reflect in all tables that link to the birthdate table.
There are additional normalization levels, such as Boyce Codd Normal Form (BCNF), fourth normal form (4NF) and fifth normal form (5NF). While normalization makes databases more efficient to maintain, they can also make them more complex because data is separated into so many different tables. (2) In data processing, a process applied to all data in a set that produces a specific statistical property. For example, each expenditure for a month can be divided by the total of all expenditures to produce a percentage. (3) In programming, changing the format of a floating-point number so the left-most digit in the mantissa is not a zero.
What is Normalization? Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored. The Normal Forms The database community has developed a series of guidelines for ensuring that databases are normalized. These are referred to as normal forms and are numbered from one (the lowest form of normalization, referred to as first normal form or 1NF) through five (fifth normal form or 5NF). In practical applications, you'll often see 1NF, 2NF, and3NF along with the occasional 4NF. Fifth normal form is very rarely seen and won't be discussed in this article. First Normal Form (1NF)
First normal form (1NF) sets the very basic rules for an organized database:
Eliminate duplicative columns from the same table. Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).
Second Normal Form (2NF) Second normal form (2NF) further addresses the concept of removing duplicative data:
Meet all the requirements of the first normal form. Remove subsets of data that apply to multiple rows of a table and place them in separate tables. Create relationships between these new tables and their predecessors through the use of foreign keys.
Third Normal Form (3NF)

Third normal form (3NF) goes one large step further:
Meet all the requirements of the second normal form. Remove columns that are not dependent upon the primary key.
Boyce-Codd Normal Form (BCNF or 3.5NF) The Boyce-Codd Normal Form, also referred to as the "third and half (3.5) normal form", adds one more requirement:
Meet all the requirements of the third normal form. Every determinant must be a candidate key.
Fourth Normal Form (4NF) Finally, fourth normal form (4NF) has one additional requirement:
Meet all the requirements of the third normal form. A relation is in 4NF if it has no multi-valued dependencies.
Should I Normalize? While database normalization is often a good idea, it's not an absolute requirement. In fact, there are some cases where deliberately violating the rules of normalization is a good practice.
First Normal Form (1NF) sets the very basic rules for an organized database:

Eliminate duplicative columns from the same table. Create separate tables for each group of related data and identify each row with a unique column (the primary key).
What do these rules mean when contemplating the practical design of a database? It's actually quite simple. The first rule dictates that we must not duplicate data within the same row of a table. Within the database community, this concept is referred to as the atomicity of a table. Tables that comply with this rule are said to be atomic. Let's explore this principle with a classic example - a table within a human resources database that stores the manager-subordinate relationship. For the purposes of our example, we'll impose the business rule that each manager may have one or more subordinates while each subordinate may have only one manager. Intuitively, when creating a list or spreadsheet to track this information, we might create a table with the following fields:
Manager Subordinate1 Subordinate2 Subordinate3 Subordinate4
However, recall the first rule imposed by 1NF: eliminate duplicative columns from the same table. Clearly, the Subordinate1-Subordinate4 columns are duplicative. Take a moment and ponder the problems raised by this scenario. If a manager only has one subordinate - the Subordinate2Subordinate4 columns are simply wasted storage space (a precious database commodity). Furthermore, imagine the case where a manager already has 4 subordinates - what happens if she takes on another employee? The whole table structure would require modification. At this point, a second bright idea usually occurs to database novices: We don't want to have more than one column and we want to allow for a flexible amount of data storage. Let's try something like this:
Manager Subordinates
where the Subordinates field contains multiple entries in the form "Mary, Bill, Joe"
This solution is closer, but it also falls short of the mark. The subordinates column is still duplicative and non-atomic. What happens when we need to add or remove a subordinate? We need to read and write the entire contents of the table. That's not a big deal in this situation, but what if one manager had one hundred employees? Also, it complicates the process of selecting data from the database in future queries. Here's a table that satisfies the first rule of 1NF:
Manager Subordinate
In this case, each subordinate has a single entry, but managers may have multiple entries. Now, what about the second rule: identify each row with a unique column or set of columns (the primary key)? You might take a look at the table above and suggest the use of the subordinate column as a primary key. In fact, the subordinate column is a good candidate for a primary key due to the fact that our business rules specified that each subordinate may have only one manager. However, the data that we've chosen to store in our table makes this a less than ideal solution. What happens if we hire another employee named Jim? How do we store his manager-subordinate relationship in the database? It's best to use a truly unique identifier (such as an employee ID) as a primary key. Our final table would look like this:
Manager ID Subordinate ID
Now, our table is in first normal form! If you'd like to continue learning about normalization, read the other articles in this series:
Normalizing Your Database: Second Normal Form (2NF)

Putting a Database in Second Normal Form
Recall the general requirements of 2NF:

Remove subsets of data that apply to multiple rows of a table and place them in separate tables. Create relationships between these new tables and their predecessors through the use of foreign keys.
These rules can be summarized in a simple statement: 2NF attempts to reduce the amount of redundant data in a table by extracting it, placing it in new table(s) and creating relationships between those tables.
Let's look at an example. Imagine an online store that maintains customer information in a database. They might have a single table called Customers with the following elements:

CustNum FirstName LastName Address City State ZIP A brief look at this table reveals a small amount of redundant data. We're storing the "Sea Cliff, NY 11579" and "Miami, FL 33157" entries twice each. Now, that might not seem like too much added storage in our simple example, but imagine the wasted space if we had thousands of rows in our table. Additionally, if the ZIP code for Sea Cliff were to change, we'd need to make that change in many places throughout the database. In a 2NF-compliant database structure, this redundant information is extracted and stored in a separate table. Our new table (let's call it ZIPs) might have the following fields:
ZIP City State If we want to be super-efficient, we can even fill this table in advance -- the post office provides a directory of all valid ZIP codes and their city/state relationships. Surely, you've encountered a situation where this type of database was utilized. Someone taking an order might have asked you for your ZIP code first and then knew the city and state you were calling from. This type of arrangement reduces operator error and increases efficiency. Now that we've removed the duplicative data from the Customers table, we've satisfied the first rule of second normal form. We still need to use a foreign key to tie the two tables together. We'll use the ZIP code (the primary key from the ZIPs table) to create that relationship. Here's our new Customers table:
CustNum FirstName LastName Address ZIP We've now minimized the amount of redundant information stored within the database and our structure is in second normal form!
Normalizing Your Database: Third Normal Form (3NF)

Putting a Database in Third Normal Form
Third normal form (3NF) is a database principle that allows you to cleanly organize your tables by building upon the database normalization principles provided by 1NF and 2NF.
There are two basic requirements for a database to be in third normal form:

Already meet the requirements of both 1NF and 2NF Remove columns that are not fully dependent upon the primary key.
NORMALIZATION IN DATABASE ABOUT NORMALIZATION Normalization is the process of efficiently organizing data in a database. Two goals of normalization process:
Eliminating Redundant Data Ensuring data dependencies The objective is to isolate data so that changes of a field can be made in just one table and then propagated through the rest of the database using the defined relationships.
5. Normalization Stages 1NF 2NF 3NF 6. Normalization Stages 7. First Normal Form (1NF) The values in each column of a table areatomic (No multi-value attributes allowed).
Each table has a primary key: minimal set of attributes which can uniquely identify a record
There are no repeating groups: two columns do not store similar information in the same table.
8. Second Normal Form (2NF) Meet all the requirements of the first normalform.
Remove subsets of data that apply to multiple rows of a table and place them in separate tables
. Create relationships between these new tables and their predecessors through the use of foreign keys.
9. Third Normal Form (3NF) Meet all the requirements of the second normal form. Every non-prime attribute of R is non-transitively dependent (i.e. directly dependent) on every super key of R.
Remove columns that are not dependent upon the primary key. 11. TRANSFORMATION FROM UNF TO HIGHERNORMAL FORMS USING AN EXAMPLE
Example: Institution Having Two departments Each Department Has 6 Different Students And SomeCourses Each Student in a department can study more than one Course Each course has its course fee. Periodically, report is generated that contains information displayed which can be represented in a table as shown in next slide.
14. Conversion to 1st Normal Form 15. Conversion To 1st Normal Form ADVANTAGES Each table has at least one minimal set of attributes which can uniquely identify a record
The values in each column have Single Value DISADVANTAGES Redundant data across multiple rows of the table is still there
Existence of partial and transitive dependencies 16. The Dependency diagram Depicts all dependencies found within given table structure Helpful in getting birds-eye view of all relationships among tables attributes Makes it less likely that will overlook an important dependency
17. Conversion to 2nd Normal Form{Dep No, Dep_name}{Stud no, Stud_name, Stud_DOB, Stud_age}{Dep No, Stud No , Course_name, Course_fee}Department Details:Course Details: Student Details:
18. Conversion to 2nd Normal Form Advantages eliminates redundant data in the table It includes no partial dependencies: Create separate tables for sets of values that apply to multiple records Disadvantages The table contains Transitive Dependencies. Some records depend on attributes other thant he tables primary key
19. Conversion to 3rd Normal Form Student Details:Department Details:Course Details:Student course:{Dep No, Dep_name} , {Stud no, Stud_name,Stud_DOB} , {Dep No, Stud No ,Course name} , {Course_name, Course_fee}
20. Conversion to 3rd Normal Form Advantages: No non-key attribute depends transitively on a candidate key All The attributes in a table Depend on a single primary key

Database normalization explained

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Database normalization explained

Uploaded by

Copyright:

Available Formats

Database normalization is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency.

First 1NF normal form

Two versions: E.F. Codd(1970), C.J. Date (2003)

Second 2NF normal form

Third 3NF normal form

Two versions: E.F. Codd(1971), C. Zaniolo (1982)

Third Normal Form (3NF)

Manager Subordinate1 Subordinate2 Subordinate3 Subordinate4

Normalizing Your Database: Second Normal Form (2NF)

Recall the general requirements of 2NF:

Normalizing Your Database: Third Normal Form (3NF)

You might also like