You are on page 1of 17

Normalisation

What is Normalization?
Normalization is a formal process for determining which fields
belong in which tables in a relational database.
Through normalization a collection of data in a record structure is
replaced by successive record structures that are simpler and
more predictable and therefore more manageable.

Normalisation is carried out for the following reasons.

1. To structure the data so that any pertinent (relevant)


relationships between entities can be represented
2. To permit simple retrieval of data in response to query
and report requests.
3. To simplify the maintenance of the data through
updates, insertions, and deletions.
4. To reduce the need to restructure or reorganize data
when new application requirements arise.

A normalized relational database provides several benefits:

 Elimination of redundant data storage.


 Decompose all data groups into two-dimensional
records.
 Eliminate any relationshiops in which data elements
do not fully depend on the primary key of the record.
 Eliminate any relationsips that contain transitive
dependency

Normalization ensures that you get the benefits relational


databases offer.

Design Vs Implementation
Designing a database structure and implementing a database structure
are different tasks.

 When you design a structure it should be described without


reference to the specific database tool you will use to implement
the system, or what concessions you plan to make for
performance reasons. These steps come later.
 After you’ve designed the database structure abstractly, then you
implement it in a particular environment.
 Too often people new to database design combine design and
implementation in one step. Implementing a structure without
designing it quickly leads to flawed structures that are difficult
and costly to modify.
 Design first, implement second, and you’ll finish faster and
cheaper.

Normalized Design: Pros and Cons

We’ve implied that there are various advantages to producing a


properly normalized design before you implement your system. Let’s
look at a detailed list of the pros and cons:

Pros of Normalizing Cons of Normalizing

More efficient database structure You can’t start building the


database before you know what
the user needs
More efficient database structure.
Better understanding of your data.
More flexible database structure
Easier to maintain database
structure
Few (if any) costly surprises down
the road
Validates your common sense and
intuition.
Avoid redundant fields.
Insure that distinct tables exist
when necessary.
Here, you can see the pros outweigh the cons.

Terminology

Primary Key

 The primary key is a fundamental concept in relational


database design.
 It’s an easy concept: each record should have something that
identifies it uniquely.
 The primary key can be a single field, or a combination of fields.
 A table’s primary key also serves as the basis of relationships
with other tables. For example, it is typical to relate invoices to
a unique customer ID, and employees to a unique department
ID.
 A primary key should be unique, mandatory, and permanent.
 A classic mistake people make when learning to create
relational databases is to use a volatile field as the primary key.
For example, consider this table:

[Companies]
Company Name
Address

 Company Name is an obvious candidate for the primary key.


Yet, this is a bad idea, even if the Company Name is unique.
 What happens when the name changes after a merger?
Not only do you have to change this record, you have to update
every single related record since the key has changed.

Another common mistake is to select a field that is usually unique


and unchanging. Consider this small table:

[People]
Social Security Number
First Name
Last Name
Date of birth

In the United States all workers have a Social Security Number that
uniquely identifies them for tax purposes. Or does it? As it turns out,
not everyone has a Social Security Number, some people’s Social
Security Numbers change, and some people have more than one. This
is an appealing but untrustworthy key.

The correct way to build a primary key is with a unique and


unchanging value.

Functional Dependency

Closely tied to the notion of a key is a special normalization concept


called functional dependence or functional dependency. The second
and third normal forms verify that your functional dependencies are
correct.

So what is a “functional dependency”?


It describes how one field (or combination (composite) of fields)
determines another field. Consider an example:

[ZIP Codes]
ZIP Code
City
County
State Abbreviation
State Name

ZIP Code is a unique 5-digit key. What makes it a key? It is a key


because it determines the other fields. For each ZIP Code there is a
single city, county, and state abbreviation. These fields are functionally
dependent on the ZIP Code field. In other words, they belong with this
key. Look at the last two fields, State Abbreviation and State Name.
State Abbreviation determines State Name, in other words, State Name
is functionally dependent on State Abbreviation. State Abbreviation is
acting like a key for the State Name field. Ah ha! State Abbreviation is
a key, so it belongs in another table. As we’ll see, the third normal
form tells us to create a new States table and move State Name into it.
Normal Forms

ZIP Code is a unique 5-digit key. What makes it a key? It is a key


because it determines the other fields. For each ZIP Code there is a
single city, county, and state abbreviation. These fields are functionally
dependent on the ZIP Code field. In other words, they belong with this
key. Look at the last two fields, State Abbreviation and State Name.
State Abbreviation determines State Name, in other words, State Name
is functionally dependent on State Abbreviation. State Abbreviation is
acting like a key for the State Name field. Ah ha! State Abbreviation is
a key, so it belongs in another table. As we’ll see, the third normal
form tells us to create a new States table and move State Name into it.
The principles of normalization are described in a series of
progressively stricter “normal forms”. First normal form (1NF) is the
easiest to satisfy, second normal form (2NF), more difficult, and so on.
There are 5 or 6 normal forms, depending on who you read. It is
convenient to talk about the normal forms by their traditional names,
since this terminology is ubiquitous in the relational database industry.
It is, however, possible to approach normalization without using this
language. For example, Michael Hernandez’s helpful Database
Design for Mere Mortals uses plain language. Whatever terminology
you use, the most important thing is that you go through the process.
First Normal Form (1NF)
The first normal form is easy to understand and apply:
A table is in first normal form if it contains no repeating groups.
What is a repeating group, and why are they bad? When you have
more than one field storing the same kind of information in a single
table, you have a repeating group. Repeating groups are the right way
to build a spreadsheet, the only way to build a flat-file database, and
the wrong way to build a relational database. Here is a common
example of a repeating group:
[Customers]
Customer ID
Customer Name
Contact Name 1
Contact Name 2
Contact Name 3

What’s wrong with this approach? Well, what happens when you have a
fourth contact? You have to add a new field, modify your forms, and
rebuild your routines. What happens when you want to query or report
based on all contacts across all customers? It takes a lot of custom
code, and may prove too difficult in practice. The structure we’ve just
shown makes perfect sense in a spreadsheet, but almost no sense in a
relational database. All of the difficulties we’ve described are resolved
by moving contacts into a related table.

[Customers]
Customer ID
Customer Name

[Contacts]
Customer ID (this field relates [Contacts] and [Customers])
Contact ID
Contact Name

Second Normal Form (2NF)

The second normal form helps identify when you’ve combined two
tables into one. Second normal form depends on the concepts of the
primary key, and functional dependency. The second normal form is:

A relation is in second normal form (2NF) if and only if it is in 1NF


and
every nonkey attribute is fully dependent on the primary key.
C.J. Date
An Introduction to Database Systems
In other words, your table is in 2NF if:

1) It doesn’t have any repeating groups.


2) Each of the fields that isn’t a part of the key is functionally
dependent on the entire key.

If a single-field key is used, a 1NF table is already in 2NF.

Third Normal Form (3NF)

Third normal form performs an additional level of verification that you


have not combined tables. Here are two different definitions of the
third normal form:

A table should have a field that uniquely identifies each of its


records, and each field in the table should describe the subject that
the table represents.
Michael J. Hernandez
Database Design for Mere Mortals

To test whether a 2NF table is also in 3NF, we ask, “Are any of the
non-key columns dependent on any other non-key columns?
Chris Gane
Computer Aided Software Engineering

When designing a database it is easy enough to accidentally combine


information that belongs in different tables. In the ZIP Code example
mentioned above, the ZIP Code table included the State Abbreviation
and the State Name. The State Name is determined by the State
Abbreviation, so the third normal form reminds you to move this field
into a new table. Here’s how these tables should be set up:

[ZIP Codes]
ZIP Code
City
County
State Abbreviation

[States]
State Abbreviation
State Name

Higher Normal Forms

There are several higher normal forms, including 4NF, 5NF, BCNF, and
PJ/NF. We’ll leave our discussion at 3NF, which is adequate for most
practical needs. If you are interested in higher normal forms, consult a
book like Date’s An Introduction to Database Systems.
Illustrations

Before Normalization

Order Cust. Cust. Name Address Order Item Item Item Price Qty. Total
Numbe Numbe Date No. Description Order Cost
r r
101426 AK100 Arun Kumar 25, Car St., 20/04/09 TA100 Table Clothes 270.00 100 27000
Triplicane, Chennai.
102356 SU100 Sukumaran 16, Rajaji St, 03/05/09 BL200 Blankets 300.00 10 6450
Chennai CU200 Curtains-
Window 150.00 5
TA100 Table Clothes 270.00 10

102569 SW100 Swarna & Co. 27, Garden St, 07/06/09 BL200 Blankets 300.00 5 7500
Chennai – 34 CU100 Curtains-
Window 150.00 5
SP200 Spreadsheet 350.00 10
TA100 Table Clothes 270.00 5
102589 ET100 Ethiraj 20,Dowing Street, 19/07/09 BL200 Blankets 300.00 10 7600
Chennai CU100 Curtains-
Window 150.00 10
SP200 Spreadsheet 350.00 5
TA100 Table Clothes 270.00 5

This table is before normalization i.e., this table will not be flexible and easy to use. The Items are redundant. They appear
several times. This will lead to confusion. Moreover for each and every order number, we have to enter customer details
which is very cumbersome process. There will be problem in modifying the customer details and item details later on. This
table is highly unorganized. So, we have to split this table in to small tables to facilitate the processing. This call for using
Normalization.
First Normalization
1. Remove all repeating groups so that the record will be in fixed
length. In the above case for one order number, several Item
details are there.
Order Record

Order Cust. Cust. Name Address Order Total


Numbe Numbe Date Cost
r r
101426 AK100 Arun Kumar 25, Car St., 20/04/09 2700
Triplicane, Chennai. 0
102356 SU100 Sukumaran 16, Rajaji St, 03/05/09 6450
Chennai
102569 SW100 Swarna & Co. 27, Garden St, 07/06/09 7500
Chennai – 34
102589 ET100 Ethiraj 20,Dowing Street, 19/07/09 7600
Chennai

Items Purchased Record

Order Item Item Item Price Qty.


Number No. Description Order
101426 TA100 Table Clothes 270.00 100
102356 BL200 Blankets 300.00 10
102356 CU200 Curtains-
Window 150.00 5
102356 TA100 Table Clothes 270.00 10
102569 BL200 Blankets 300.00 5
102569 CU100 Curtains-
Window 150.00 5
102569 SP200 Spreadsheet 350.00 10
102569 TA100 Table Clothes 270.00 5
102589 BL200 Blankets 300.00 10
102589 CU100 Curtains-
Window 150.00 10
102589 SP200 Spreadsheet 350.00 5
102589 TA100 Table Clothes 270.00 5

Now each record is fixed in length and does not contain any repeating groups

Second Normal Form


Each data item in a record is fully functionally dependent on the record key.

Order Cust. Order Total


Numbe Numbe Date Cost
r r
101426 AK100 20/04/09 2700
0
102356 SU100 03/05/09 6450
102569 SW100 07/06/09 7500
102589 ET100 19/07/09 7600

Cust. Cust. Name Address


Numbe
r
AK100 Arun Kumar 25, Car St.,
Triplicane, Chennai.
SU100 Sukumaran 16, Rajaji St,
Chennai
SW100 Swarna & Co. 27, Garden St,
Chennai – 34
ET100 Ethiraj 20,Dowing Street,
Chennai
List of attributes 1NF 2NF 3NF
BOOK_NO BOOK_NO BOOK_NO BOOK_NO

BOOK_NAME BOOK_NAME BOOK_NAME BOOK_NAME

SIZE SIZE SIZE SIZE

TIME_PUBL TIME_PUBL TIME_PUBL TIME_PUBL

YEAR_PUBL YEAR_PUBL YEAR_PUBL YEAR_PUBL

PUB_HOUSE PUB_HOUSE PUB_HOUSE PUB_HOUSE

COST COST COST COST

AUTHOR AUTHOR AUTHOR AUTHOR

CHIEF_AUTH CHIEF_AUTH CHIEF_AUTH CHIEF_AUTH

COMP_AUTH COMP_AUTH COMP_AUTH COMP_AUTH

REV_AUTH REV_AUTH REV_AUTH REV_AUTH

LANG_NO LANG_NO LANG_NO LANG_NO

LANG_NAME LANG_NAME LANG_NAME NATION_NO

LANG_VN LANG_VN LANG_VN SPEC_NO

LANG_SYS LANG_SYS LANG_SYS COLL _NO

NATION_NO NATION_NO NATION_NO KW_MASTER

NATION_NAME NATION_NAME NATION_NAME KW_SLAVE

NATION_VN NATION_VN NATION_VN COMMENT

SPEC_NO SPEC_NO SPEC_NO


SPEC_NAME SPEC_NAME SPEC_NAME LANG_NO

COLL_NO COLL_NO COLL _NO LANG_NAME

COLL_NAME COLL _NAME COLL _NAME LANG_VN

KW_MASTER KW_MASTER KW_MASTER LANG_SYS

KW_SLAVE KW_SLAVE KW_SLAVE

COMMENT COMMENT COMMENT NATION_NO

NATION_NAME

NATION_VN

SPEC_NO

SPEC_NAME

COLL _NO

COLL _NAME

Secondly, we normalize the reader table with its attributes and have this table:

List of attributes 1NF 2NF 3NF


READER_NO READER_NO READER_NO READER_NO

READER_NAME READER_NAME READER_NAME READER_NAME

ADDRESS ADDRESS ADDRESS DEPT_NO

BIRTH_DATE BIRTH_DATE BIRTH_DATE ADDRESS


DEPT_NO DEPT_NO DEPT_NO BIRTH_DATE

DEPT_NAME DEPT_NAME DEPT_NAME COMMENT

COMMENT COMMENT COMMENT

DEPT_NO

DEPT_NAME

Thirdly, we normalize the book ticket_L&GB table with its attributes:

List of attributes 1NF 2NF 3NF


READER_NO READER_NO READER_NO READER_NO

READER_NAME READER_NAME BOOK_NO BOOK_NO

BOOK_NO BOOK_NO BORROW_DATE BORROW_DATE

BOOK_NAME BOOK_NAME RETURN_DATE RETURN_DATE

BORROW_DATE BORROW_DATE COMMENT COMMENT

RETURN_DATE RETURN_DATE

COMMENT COMMENT READER_NO READER_NO

READER_NAME READER_NAME

BOOK_NO BOOK_NO

BOOK_NAME BOOK_NAME
And then, the magazine table, we realize that, the list of attributes has a
repeating group, it includes these attributes: MAG_HEAD_NO, MAG_NAME,
START_YEAR, MAG_SHEFL, ISSN_NO, PUB_HOUSE, LANG_NO,
LANG_NAME, LANG_VN, LANG_SYS, NATION_NO, NATION_NAME,
NATION_VN, SPEC_NO, SPEC_NAME, COLL_NO, COLL_NAME, and
COMMENT, so we decompose it into a new table with a primary key:
MAG_HEAD_NO without data loss. The remaining attributes (MAG_HEAD_NO,
MAG_DETL_NO, YEAR, VOLUME, NUMBER, MONTH, QUANTITY) are put in
another table with a primary key including 2 attributes: MAG_HEAD_NO,
MAG_DETL_NO. And we have the below table:

List of attributes 1NF 2NF 3NF


MAG_HEAD_NO MAG_HEAD_NO MAG_HEAD_NO MAG_HEAD_NO

MAG_DETL_NO MAG_NAME MAG_NAME MAG_NAME

MAG_NAME START_YEAR START_YEAR START_YEAR

START_YEAR MAG_SHEFL MAG_SHEFL MAG_SHEFL

MAG_SHEFL ISSN_NO ISSN_NO ISSN_NO

ISSN_NO PUB_HOUSE PUB_HOUSE PUB_HOUSE

PUB_HOUSE LANG_NO LANG_NO LANG_NO

LANG_NO LANG_NAME LANG_NAME NATION_NO

LANG_NAME LANG_VN LANG_VN SPEC_NO

LANG_VN LANG_SYS LANG_SYS COLL_NO

LANG_SYS NATION_NO NATION_NO COMMENT

NATION_NO NATION_NAME NATION_NAME


NATION_NAME NATION_VN NATION_VN LANG_NO

NATION_VN SPEC_NO SPEC_NO LANG_NAME

SPEC_NO SPEC_NAME SPEC_NAME LANG_VN

SPEC_NAME COLL_NO COLL_NO LANG_SYS

COLL_NO COLL_NAME COLL_NAME

COLL_NAME COMMENT COMMENT NATION_NO

YEAR NATION_NAME

VOLUME MAG_HEAD_NO MAG_HEAD_NO NATION_VN

NUMBER MAG_DETL_NO MAG_DETL_NO

MONTH YEAR YEAR SPEC_NO

QUANTITY VOLUME VOLUME SPEC_NAME

COMMENT NUMBER NUMBER

MONTH MONTH COLL_NO

QUANTITY QUANTITY COLL_NAME

MAG_HEAD_NO

MAG_DETL_NO

YEAR

VOLUME

NUMBER
MONTH

QUANTITY

And the magazine ticket_L&GB table with its attributes:

List of attributes 1NF 2NF 3NF


READER_NO READER_NO READER_NO READER_NO
MAG_HEAD_NO
READER_NAME READER_NAME READER_NAME
MAG_DETL_NO
MAG_HEAD_NO MAG_HEAD_NO MAG_HEAD_NO
BORROW_DATE
MAG_DETL_NO MAG_DETL_NO MAG_DETL_NO
RETURN_DATE
MAG_NAME MAG_NAME MAG_NAME
COMMENT
YEAR YEAR YEAR

VOLUME VOLUME VOLUME


READER_NO
NUMBER NUMBER NUMBER
READER_NAME
MONTH MONTH MONTH
MAG_NAME
BORROW_DATE BORROW_DATE QUANTITY
YEAR
RETURN_DATE RETURN_DATE BORROW_DATE
VOLUME
COMMENT COMMENT RETURN_DATE
NUMBER
COMMENT
MONTH

.
.
.