Normalization is the process of organizing data within a database to reduce redundancy and
dependency, which helps ensure the integrity and efficiency of the data. The goal of
normalization is to divide large tables into smaller, more manageable ones while ensuring
that the relationships between the tables remain consistent.
There are several normal forms (NF), each with its own criteria. These progressively
eliminate different types of dependencies and redundancy.
1st Normal Form (1NF):
Key Criteria:
o Atomicity: Each column contains only atomic (indivisible) values, meaning
there should be no repeating groups or arrays within a single column.
o Uniqueness: Each row must be unique, and each column must contain a single
value. No multiple values or sets of values should appear in a single field.
o No Duplicate Rows: All rows in the table must be unique. A table must have
a primary key to identify each row uniquely.
How it eliminates dependencies:
o It eliminates repeating groups of columns and ensures that each column
represents only one attribute. This simplifies the table by eliminating multi-
valued attributes, but it does not fully resolve other types of redundancy.
2nd Normal Form (2NF):
Key Criteria:
o 1NF Compliance: The table must first satisfy all the criteria of 1NF.
o No Partial Dependency: Every non-prime attribute (attribute not part of a
candidate key) must be fully functionally dependent on the entire primary key.
No attribute should depend only on part of a composite key.
How it eliminates dependencies:
o Partial Dependencies: If a table has a composite primary key (a primary key
made up of more than one attribute), 2NF eliminates partial dependencies,
where a non-prime attribute depends only on part of the composite key rather
than the entire key. This is important for multi-column primary keys.
Example: Consider a table where a composite primary key is (Student_ID,
Course_ID):
o If Instructor_Name depends only on Course_ID (not on the full composite
key), it violates 2NF.
o To fix this, Instructor_Name should be moved to a separate table where
Course_ID is the primary key.
3rd Normal Form (3NF):
Key Criteria:
o 2NF Compliance: The table must first satisfy all the criteria of 2NF.
o No Transitive Dependency: No non-prime attribute should depend on
another non-prime attribute. In other words, if attribute A depends on B, and B
depends on C, then C should not depend on A.
How it eliminates dependencies:
o It resolves transitive dependencies, where a non-key attribute depends on
another non-key attribute. By ensuring non-prime attributes are only
dependent on the primary key, 3NF eliminates redundancy and ensures a
better structure.
Example: In a table where Student_ID is the primary key, and there is a dependency
like Student_ID -> Advisor_ID, Advisor_ID -> Advisor_Name, then
Student_ID -> Advisor_Name is a transitive dependency, which 3NF eliminates.
The Advisor_Name should be placed in a separate table with Advisor_ID as the
primary key.
Boyce-Codd Normal Form (BCNF):
Key Criteria:
o 3NF Compliance: The table must first satisfy all the criteria of 3NF.
o No Exceptions for Functional Dependencies: Every determinant (an
attribute or set of attributes that can uniquely determine another attribute) must
be a candidate key. This means there should be no functional dependencies
where a non-candidate key determines another attribute.
How it eliminates dependencies:
o BCNF is a stricter version of 3NF. While 3NF handles transitive
dependencies, BCNF eliminates any functional dependency where a non-
candidate key determines another attribute. It removes any potential anomalies
from functional dependencies.
Example: If we have a table with Student_ID, Course_ID, and Instructor_ID as
columns, and the functional dependency Instructor_ID -> Instructor_Name, this
might violate BCNF if Instructor_ID is not a candidate key but can still determine
Instructor_Name. To resolve this, we would separate this dependency into another
table.
4th Normal Form (4NF):
Key Criteria:
o BCNF Compliance: The table must first satisfy all the criteria of BCNF.
o No Multi-valued Dependencies: A table should not have multi-valued
dependencies. A multi-valued dependency occurs when one attribute
determines another set of independent attributes.
How it eliminates dependencies:
o 4NF addresses multi-valued dependencies, where a single column might be
associated with multiple values, but those values are independent of each
other. The goal is to separate these independent relationships into different
tables to avoid redundancy.
Example: A table that stores information about employees and their skills might have
multi-valued dependencies if an employee can have multiple skills, and those skills
are independent. To resolve this, we would separate the employee and skills into two
tables to remove the multi-valued dependency.
5th Normal Form (5NF):
Key Criteria:
o 4NF Compliance: The table must first satisfy all the criteria of 4NF.
o No Join Dependency: The table should not contain join dependencies,
meaning that the table should not be decomposable into smaller tables without
losing information.
How it eliminates dependencies:
o 5NF resolves join dependencies, where a table might be decomposed into
multiple tables, but rejoining them would cause data loss or redundancy. It
ensures that no information is hidden or lost in decomposition.
Example: In a scenario where we have a table with Project_ID, Employee_ID, and
Skill_ID, and there are dependencies between these attributes, 5NF would involve
ensuring that any valid combination of these attributes can be reconstructed without
redundancy or loss of data.
6th Normal Form (6NF):
Key Criteria:
o 5NF Compliance: The table must first satisfy all the criteria of 5NF.
o Temporal Data: 6NF is specifically relevant for databases dealing with
temporal data (data that changes over time), ensuring that information about
time-sensitive events is normalized to the finest level possible.
How it eliminates dependencies:
o 6NF resolves dependencies that involve time-related changes. It ensures that
each data point is recorded in such a way that it can reflect its time-specific
values without redundancy.
Example: A table storing employee positions over time could be normalized in 6NF
to record the history of position changes, ensuring that each record reflects a specific
point in time and removes any redundant or conflicting time-based data.
Summary of Normal Forms:
1. 1NF: Eliminate repeating groups and ensure atomic values.
2. 2NF: Eliminate partial dependencies (dependencies on part of a composite primary
key).
3. 3NF: Eliminate transitive dependencies (non-key attributes depending on other non-
key attributes).
4. BCNF: Every determinant is a candidate key.
5. 4NF: Eliminate multi-valued dependencies.
6. 5NF: Eliminate join dependencies (decompose tables without losing information).
7. 6NF: Handle temporal data and remove time-based redundancy.
Each normal form builds upon the previous one by addressing more complex types of
dependencies and improving the database structure to reduce redundancy, ensure data
integrity, and improve query efficiency.