You are on page 1of 12

Normalization

1NF
2NF
3NF
&
4NF

G.T.
Normalization
Normalization of Database
Database Normalization is a technique of organizing the data in
the database. Normalization is a systematic approach of
decomposing tables to eliminate data redundancy(repetition)
and undesirable characteristics like Insertion, Update and Deletion
Anomalies. It is a multi-step process that puts data into tabular
form, removing duplicated data from the relation tables.

Normalization is used for mainly two purposes,


• Eliminating redundant(useless) data.
• Ensuring data dependencies make sense i.e data is logically
stored.

Normalization has four forms. 1st , 2nd, 3rd & 4th Normal Forms.
Normalization Explained

Look at the above example carefully. Each row represents a record


and each record has problems.

Normalization is about data being atomic or data atomicity or simply


put, a specific type of data in each cell and but one value in each
cell . That and only that.

When normalizing we need to begin with 1st NF or 1NF. Each normal


form has rules.
Rules:
1NF
- Each cell is to be singe valued.
- Entries in a column are to be of the same type
- Rows are uniquely identifiable. (add more columns if necessary)

Look at our data.

Each cell is to be single valued: An example of this violation is the


purchase for Peppa Skull with the purchase of Xbox One and PS 1.
Another example is the subscription of Peppa Skull to two newsletters,
Xbox News & Play Station News. All such violations should be corrected

Entries in a column are to be of the same type: Look at the Supplier


and the Supplier Phone for the Mr. Skull’s purchase. These are violations
as well. An actual number and supplier is required.
Rows are uniquely identifiable. Look at the example of Peppa Skull’s
purchase. This record contains two items purchased, two newsletters
and other 1NF infractions. For rows to be uniquely identifiable, cells to be
single valued and column entries should of the same type, this record
needs to be addressed. To normalize this type of occurrence we do the
following.

Now rows are uniquely identifiable, cells are single valued and column
entries are of the same type. The troublesome record has been dealt with
and we now adhere to all the rules of 1NF.
Note: Joe Johcrows, based on addresses, are 2 different individuals
Mr Skull’s single purchase is now one record.
2NF
Rules:
- All attributes (non key columns ) must be dependent on the table key.

Look at our data.


All table attributes must be dependent on the table key: Examples of
this violation are: Item, Supplier, Supplier Phone and Price are not
dependent on Cust ID. An item and its price and supplier have nothing
at all to do with the customer, The only time these two interact is when
a customer is purchasing an item.

In Essence a customer can exist without and item and vice versa
basically. However, an item has to be linked to a supplier or
manufacturer.

We now have to add an additional table for unrelated data.


One table now becomes 2.
Notice however, the store carries 3 items. Each item has a supplier. A single
supplier can however supply multiple items. This is a one to many
relationship. A user can make multiple purchases purchasing multiple items
each time. This is also a one to many relationship.

The All attributes (non key columns ) must be dependent on the table key
criteria has been met for 2NF. We however seem to have lost all the orders.
We don’t know who bought what. A Junction table is what is needed. This is
where information from both tables meet. Basically a table showing who
bought what.
Two tables now become 3. The third table is a reflection of all orders by
clients, inclusive of the double purchase by Peppa Skull. In our new table
however, we have no set Primary Key. If a primary key was to be set, it’d
have to be a composite key, which is made by combining 2 or more fields
to form a unique identifier.
At this stage we have satisfied the rule of 2NF which states All attributes (non key columns ) must be
dependent on the table key.
Consider this though, what if the store started selling more Microsoft Items and more Sony Items ??
This would mean that all the information for supplier Sony or Microsoft, would be repeated on the
items table. If this were the case maybe we could have an Item table and a Supplier table. This
would work because Supplier, Supplier Phone can go together and Item and Price can go
together, being linked by the Item Id or Supplier ID
Rules:
3NF
- All fields can be determined by only the key in its table and no other column

This rules is fairly simple. The only violation to this rule is found in the table with the items. Can Sony’s
Telephone number be determined by PS1 ? But in the same breath PS4 does the same thing. The
solution to this issue was mentioned earlier and that is to further separate the table with the items in
it.
Think on this. If we got new items from Sony, for each item we would have to repeat the supplier
address. That’s unnecessary repetition. What if you had to suddenly change the contact number
For Sony ? Then you would have to find every single Sony entry in your database right ?
Now if the phone number is to be updated for a supplier, we can make one
update and its reflects for every item supplied. Another advantage of doing
this is that now a bogus supplier cannot be entered because if a supplier
doesn't exist then it cant supply an item, and if an item is being sold in store,
it must have been supplied by an already established supplier.
Now All fields can be determined by only the key in its table and no other
column
Rules:
4NF
- No multi-valued dependencies

A multi-valued dependency is basically this: Customer ID, Customer Name, and


Shipping Address is without a doubt linked. Newsletter represents the subscription to a
newsletter. If a customer decides to subscribe to all 100 newsletters offered, then his id,
name and address will be in the table 100 times. This is redundancy of address and
name wouldn’t you agree ?? So two unrelated columns are dependent on the
primary key. Two different values dependent on the one key. To fix this we need to
separate this information because multivalued dependency could become a
problem in a large, complex database.
Now, a customer can subscribe to any amount of newsletters and not duplicate
unnecessary info. To change a customer’s address simply change it one time in the
customer table and it’ll reflect everywhere.. Now the database has been successfully
normalized to the fourth normal form.

You might also like