You are on page 1of 22

NORMALISATION

Bamweyana Ivan
LSG: 2102
Relation Rules (Codd, 1970)
1. Only one value in each cell (intersection of row and
column)
2. All values in a column are about the same subject
3. Each row is unique
4. No significance in column sequence
5. No significance in row sequence
Normalisation
• Process of converting tables to conform to Codd’s
relational rules
• Split tables into new tables that can be joined at query
time
• The relational join
• Several levels of normalization
• Forms: 1NF, 2NF, 3NF, 4NF and 5NF.
• Normalization creates many “expensive” joins
Normalisation
• Database normalization is the process of organizing
the fields and tables of a relational database to
minimize redundancy and dependency.
• Normalization usually involves dividing large tables into
smaller (and less redundant) tables and defining
relationships between them.
• The objective is to isolate data so that additions, deletions,
and modifications of a field can be made in just one table
and then propagated through the rest of the database via
the defined relationships. (wikipedia)
Normalisation
• There are two goals of the normalization process:
 eliminating redundant data (for example, storing the same data in
more than one table) and
 ensuring data dependencies make sense (only storing related data
in a table).

• Both of these are worthy goals as they


 ensure that data is logically stored- to avoid
inconsistency and unreliability in the data.
 reduce the amount of space a database
consumes
Relational joins
• Every table must have a “primary key”
• A column (or combination of columns) holding a unique value for
each tuple
• Joins are effected by finding the same value as the
“primary key” in another table
• this is called the “foreign key”
• Joins may be extended to third and subsequent tables –
hence normalisation
• Tables must adhere to “normal form” (be normalised)
Normal Forms
• Normal forms are a series of developed guidelines for
ensuring that databases are normalized.
• They are numbered from one (the lowest form of
normalization, referred to as first normal form or 1NF)
through five (fifth normal form or 5NF).
• In practical applications, you'll often see 1NF, 2NF and 3NF
along with the occasional 4NF. Fifth normal form is very
rarely seen.
• Occasionally necessary to stray from them to meet practical
business requirements.
1st Normal Form
• Each attribute must be atomic
• Atomic means can not be further decomposed and simplified
• No repeating columns within a row.
• No multi-valued columns.
• 1NF simplifies attributes
• A relation is said to be in 1NF if it contains no
non-atomic values and each row can provide
a unique combination of values
• Queries become easier.
1NF Example
Employee (unnormalized)
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C, Perl, Java
2 Barbara Jones 224 IT Linux, Mac
3 Jake Rivera 201 R&D DB2, Oracle, Java

Employee (1NF)
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C
1 Kevin Jacobs 201 R&D Perl
1 Kevin Jacobs 201 R&D Java
2 Barbara Jones 224 IT Linux
2 Barbara Jones 224 IT Mac
3 Jake Rivera 201 R&D DB2
3 Jake Rivera 201 R&D Oracle
3 Jake Rivera 201 R&D Java
2nd Normal Form
Each attribute must be functionally dependent
on the primary key.
• Functional dependence - the property of one or more attributes that
uniquely determines the value of other attributes.
• Any non-dependent attributes are moved into a smaller (subset) table.
2NF improves data integrity.
• Prevents update, insert, and delete anomalies.
Functional Dependence
Employee (1NF)
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C
1 Kevin Jacobs 201 R&D Perl
1 Kevin Jacobs 201 R&D Java
2 Barbara Jones 224 IT Linux
2 Barbara Jones 224 IT Mac
3 Jake Rivera 201 R&D DB2
3 Jake Rivera 201 R&D Oracle
3 Jake Rivera 201 R&D Java

• Name, dept_no, and dept_name are functionally dependent


on emp_no. (emp_no -> name, dept_no, dept_name)
• Skills is not functionally dependent on emp_no since it is
not unique to each emp_no.
It is this functional dependency that is eliminated in 2NF
2NF
Employee (1NF)
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C
1 Kevin Jacobs 201 R&D Perl
1 Kevin Jacobs 201 R&D Java
2 Barbara Jones 224 IT Linux
2 Barbara Jones 224 IT Mac
3 Jake Rivera 201 R&D DB2
3 Jake Rivera 201 R&D Oracle
3 Jake Rivera 201 R&D Java

Employee (2NF) Skills (2NF)


emp_no skills
emp_no name dept_no dept_name
1 C
1 Kevin Jacobs 201 R&D 1 Perl
2 Barbara Jones 224 IT 1 Java
3 Jake Rivera 201 R&D 2 Linux
2 Mac
3 DB2
3 Oracle
3 Java
2NF
• A relation is said to be in 2NF if it is already in 1NF and
every attribute fully depends on the primary key of the
relation
• If a table has some attributes which are not dependant on
the primary key of that table, then it is not in normal form
Data Integrity
Employee (1NF)
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C
1 Kevin Jacobs 201 R&D Perl
1 Kevin Jacobs 201 R&D Java
2 Barbara Jones 224 IT Linux
2 Barbara Jones 224 IT Mac
3 Jake Rivera 201 R&D DB2
3 Jake Rivera 201 R&D Oracle
3 Jake Rivera 201 R&D Java

• Insert Anomaly - adding new values. E.g., inserting a new department does not
require the primary key of emp_no to be added.
• Update Anomaly - multiple updates for a single name change causes
performance degradation. E.g., changing IT dept_name to IS
• Delete Anomaly - deleting wanted information. E.g., deleting the IT department
removes employee Barbara Jones from the database
This is the purpose of the 3NF – to maintain data integrity!
Third Normal Form (3NF)
Remove transitive dependencies.
• Derived dependency or Transitive dependence - two
separate entities exist within one table.
• Any transitive dependencies are moved into a smaller
(subset) table.
3NF further improves data integrity.
• Prevents update, insert, and delete anomalies.
Transitive Dependence
Employee (2NF)
emp_no name dept_no dept_name
1 Kevin Jacobs 201 R&D
2 Barbara Jones 224 IT
3 Jake Rivera 201 R&D

Note that, dept_name is functionally dependent on dept_no.


Dept_no is functionally dependent on emp_no, so via the
middle step of dept_no, dept_name is functionally dependent
on emp_no.

(emp_no -> dept_no , dept_no -> dept_name, thus emp_no -> dept_name)

This is what is called transitive dependency and it is what is


eliminated in 3NF
3NF
Employee (2NF)
emp_no name dept_no dept_name
1 Kevin Jacobs 201 R&D
2 Barbara Jones 224 IT
3 Jake Rivera 201 R&D

Employee (3NF) Department (3NF)


emp_no name dept_no dept_no dept_name
1 Kevin Jacobs 201
201 R&D
2 Barbara Jones 224
224 IT
3 Jake Rivera 201
Other Normal Forms
Boyce-Codd Normal Form (BCNF)
• Strengthens 3NF by requiring the keys in the
functional dependencies to be super keys Fourth
Normal Form (4NF)
• Eliminate trivial multivalued dependencies.
Fifth Normal Form (5NF)
• Eliminate dependencies not determined by keys.
Normal Forms-summary
• First Normal Form
• Table has rows and columns
• Every row is unique
• Only one value is in each location
• Primary key is defined
• Second Normal Form
• Table should be in 1NF
• Columns that are not the primary key must be totally dependant on
the primary key
• Each column is only searchable through its table’s primary key
• This further reduces redundancy and manages delete, update and
insert anomalies
Normal Forms-summary
• Third Normal Form
• 3NF is also in 2NF (which is also in 1NF!)
• All columns that are not primary keys must depend on the primary key
• In 3NF, all columns depend on the primary key only
• i.e. it is not possible to use any other (non-PK) column to find the value
of a column
4NF (Fourth Normal Form)
• Fourth normal form (4NF) has one additional requirement:
1. Meet all the requirements of the third normal form.
2. A relation is in 4NF if it has no multi-valued dependencies.
• Multivalued dependencies occur when the presence of one or
more rows in a table implies the presence of one or more
other rows in that same table.
• Remember, these normalization guidelines are cumulative.
For a database to be in 2NF for example, it must first fulfil all
the criteria of a 1NF database.
• Good idea but not an absolute requirement (e.g. dynamic
segmentation) – de-normalisation is required
• The dynamic segmentation process enables multiple sets of attributes to
be associated with any portion of a line feature without segmenting the
underlying feature. In the transportation field, examples of such linearly
referenced data might include accident sites, road quality, and traffic
volume.
Read about-Repeating Group

• http://www.basicsofcomputer.com/modeling_repeating_gr
oups.htm
• ftp://ftp.cba.uri.edu/classes/horton/Class%20Notes/Lectur
e%20Notes%20-%20Chapter%205%20(part%202).doc

You might also like