You are on page 1of 17

UNIT 05: Functional Dependencies and Normalization

Lesson 5.1: Informal Design Guidelines for Relation Schemas and


Normalization for Relational Databases

STRUCTURE:

5.1.1. Informal Design Guidelines for Relation Schemas

5.1.2. Normal Forms Based on Primary Keys

- Normalizations

- Normalizations process:

- The Need for Normalization:

- Repeating group:

- The First Normal Form:

5.1.3. General Definitions of 2NF,3NF and BCNF

- 2NF

- 3NF

- Boyce-Codd Normal Form (BCNF)

5.1.1. Informal Design Guidelines for Relation Schemas

The goal of a relational database design is to generate a set of relation schemas that allows us to store
information without any redundant (repeated) data. It also allows us to retrieve information easily and
more efficiently.

For this we use an approach normal form as the set of rules. The main objective of a relation schema is
to:
Minimizing data redundancy

Minimizing the deletion, insertion, and update anomalies

Reduces input and output delays

Reducing memory usage

Supports a single consistent version of the truth

It is an industry best method of tables or entity design.

Going through all these steps is a useful tool for requirements analysis and data modelling process of
software development. Thereby it can be useful in reducing the all undesirable problems.

Dependency:

Dependencies could be depicted with the help of a diagram.

Dependency Diagram

I) Depict all dependencies found within a given table structure.

ii) Reduces the possibility or makes it less likely that any important Dependency gets overlooked or
missed.

iii) Desirable dependencies based on entire primary key.

Partial dependency:

It is based on part of composite primary key.

Transitive Dependency:

One non-prime attribute depends on another non-primary attribute.

A functional dependency

A functional dependency is defined as dependency on another field. If there are three fields say, X, Y and
Z in a relation r and
X is dependent on Y

Y is dependent on Z

Then according to functional Dependency

X is dependent on YZ.

The left and right sides are called determinant and the dependant. Both dependent and determinant are
sets of attribute. Functional Dependency or FD is not a key constraint. It cannot determine or hold back
transaction of a query.

A functional dependency is a many to one relationship between two sets of a given relation r. Functional
dependencies or FD allow us to express constraints that we cannot express with super keys. We shall
use functional dependencies in two ways:

To test relations whether they are legal under a given set of functional dependencies. If a relation m is
legal under a set R of functional dependencies, we say m satisfies R.

To specify constraints on the set of legal relations. We will concern ourselves with only those relations
that satisfy a given set of functional dependencies. If we constrain to relations on schema m that satisfy
a sat R of functional dependencies, we say that R holds on m.

5.1.2. Normal Forms Based on Primary Keys

- Normalizations

Normalization is a method by which we try to minimize a table into a more structural form so that
execution and query is simpler. Normalization is a process of assigning attributes to entities.

Database normalization is data design and organization process applied to data structures based on
their functional dependencies and primary keys that help build relational databases. This helps in:

Reduce Data Redundancy


Help eliminate data Anomalies

Produce controlled redundancies to linked tables.

No information is ever lost in normalization.

Result will be a database that can produce the same information as the original.

- Normalizations process:

Normalization works to a series of stages called Normal forms have their nomenclature as follows:

First normal form (1NF)

Second Normal form (2NF)

Third Normal form (3NF)

Boyce- Codd normal form (BCNF)

Fourth Normal form (4NF)

Fifth Normal form (5NF)

Domain Key Normal Form (DKNF)

- The Need for Normalization:

Database normalization is a useful tool for requirements analysis and data modeling process of software
development. Thus the normalization is the process to reduce the all undesirable problems by using the
functional dependencies and keys.

Example: A company that manages building project

Charges its clients according to billing hours spent on each contract.

Hourly billing rate depends on employee’s position.

Periodically, a report is generated that contains information as follows:


Pro ProName EmpNo Empname Job Class Working Hourly Total
number Hours bill Charge
15 HRV 101 Vivan EE 2 100 200
106 Rashmi Labor 20 5
109 Payal CS 2 25
113 pallavi Accounts 3 10
Subtotal 380
New
Project
16 MRUX

Table entities invite data inconsistencies.

Table displays potential data anomalies like

Update- modifying ‘job class’

Insertion- new entry must be assigned project.

Deletion-if employee deleted, other vital data will be lost.

- Repeating group:

It derives its name from the fact that a group of multiple entities can exist for any single-key attribute
occurrences.

The relational table must not contain any repeating groups.

Normalizing any of the table structure will reduce these data redundancies.

- The First Normal Form:

In the first normal form, we try to make each table dependent on a primary key. Each field in a table
must be functionally dependent on the primary key.
Converting to 1NF:
A table in a relational database has to be in 1NF if

Repeating groups must be eliminated.

Primary Key determined.

Uniquely identify attribute values.

All Attributes depend on primary key.

Definition of 1NF:

Tabular format in which

All key attributes are defined.

There must be no repeating groups in the table.

All attributes depend on primary key.

All relational tables must satisfy 1NF requirement.

Some tables contain partial dependencies but still are subject to data redundancy.

Example: Consider DEPARTMENT relation schema in which primary key is DNUMBER. And we extend it
by introducing DLOCATIONS attribute as shown. Each department may have a number of locations. The
domain of DLOCATIONS includes atomic values; some tuples can have set of these values. DLOCATIONS
is not functionally dependent on the primary key DNUMBER. The domain of DLOCATIONS includes set of
values and hence non-atomic. Here DNUMBER->DLOCATIONS, so that each set is considered as a single
member of domain attributes. So the DEPARTMENT relation is not in 1NF.

The three techniques to convert it in to 1 NF are

1) Remove DLOCATIONS attribute that violates 1NF and place it in separate relation along with primary
key DNUMBER of DEPARTMENT. Primary key is the combination of {DNUMBER, DLOCATION}.
2) Now expand the key and then there will be separate tuple in the DEPARTMENT relation for each
location in the DEPARTMENT table. This has the disadvantage of redundancy problem.

3) If the number of values for the attribute is known then replace the DLOCATIONS attribute by atomic
attributes. As in figure c DLOCATIONS attribute is divided in to DLOCATION1, DLOCATION2, and
DLOCATION3. This has the disadvantage of introducing null values if most departments have fewer than
three locations.

DEPARTMENT

DNAME DNUMBER DMGRSSN DLOCATIONS

DEPARTMENT

DNAME DNUMBER DMGRSSN DLOCATIONS

Research 5 333445577 {Bellaire,Sugarland,Houston}

Administration 4 367383733 {Stafford}

Headquarters 1 887737394 {Houston}

c) DEPARTMENT
DNAME DNUMBER DMGRSSN DLOCATION

Research 5 333445577 Bellaire


Research 5 333445577 Sugarland
Research 5 333445577 Houston
Administration 4 367383733 Stafford
Headquarters 1 887737394 Houston

Figure: Normalization in to 1NF. a) A relation schema that is not in 1NF. b) Example state of
DEPARTMENT relation c) 1NF version of DEPARTMENT relation.

1NF disallows multivalued attributes that are composite. They are called nested relations. Here each
tuple can have relation within it. Each tuple represents employee entity and relation PROJS (PNUMBER,
HOURS) within the tuple represents employee’s projects and working hours per week. The schema can
be represented as follows

EMP_PROJ (SSN, ENAME, {PROJS (PNUMBER, HOURS)}). Here primary key of the EMP_PROJ relation is
SSN and PNUMBER is the partial key of the nested relation. To normalize in to 1NF we remove nested
relation attributes in to new relation and propagate primary key in to that relation as shown in figure.

EMP_PROJ

SSN ENAME PROJS


PNUMBER HOURS

EMP_PROJ

SSN ENAME PNUMBER HOURS


123456789 Smith 1 32.7
2 6.9
666677575 Ram 3 40.0

333433349 Joy 1 20.0


2 20.0

EMP_PROJ1 EMP_PROJ2

SSN ENAME SSN PNUMBER HOURS

Figure: Normalizing nested relations into 1NF a) Schema of the EMP_PROJ relation with a nested
relation attribute PROJS. b) Extension of the EMP_PROJ relation showing nested relation showing in
each tuple. c) Decomposition of the EMP_PROJ in to relations EMP_PROJ1 and EMP_PROJ2 by
propagating the primary key.

5.1.3. General Definitions of 2NF, 3NF and BCNF

- 2NF

2NF is a normal form in database normalization. It requires that all the data elements in a table are full
functionally dependent on the table's primary key. If data clement only dependent on part of primary
key, then they are parsed out to separate tables. If the table has a single field as the primary key, it is
automatically in 2NF. A functional dependency X->Y is a full functional dependency if removal of any
attribute A form X means that the dependency does not satisfy anymore i.e. for any attribute A€X ,X-
{A}) does not functionally determine Y.

A functional dependency X->Y is a partial functional dependency if some attribute A €X can be removed
from X and the dependency holds. I.e. for any attribute A€X ,X-{A}) -> Y.

Converting to 2NF:
Identify all key components:

You must write each key component in a separate line.


Next, you write original key on last line.

Finally, write down the dependent attributes after each key.

Each line will become a new table.

2NF conversion result:


Consider the above Example of managing building project

ProT: ProNo->Proname.

Exp_T: EmpNo -> empname, Jobclass, ChargHours.

Works: proNo, Empno->TTbill {ProNo, EmpNo}

Define 2NF:

Table is in 2NF, if

It is in 1NF.

It includes no partial dependencies.

No attributes depend on only a portion of primary key.

- 3NF

The 3NF is a normal form used in database normalization to check if the entire non key attributes of a
relation depend only on the candidate keys of the relation. This means that all non-key attributes are
mutually independent or in other words that a non key attribute cannot be transitively dependent on
another non-key attribute.

A functional dependency X->Y is transitive dependency if there is set of attributes Z that is neither a
candidate key nor a subset of any key of R, and both X->Z and Y->Z satisfy.

A relation schema R is in 3NF if each non-prime attribute of R meets both of the following.

• It has to be fully functionally dependency on every key of R.

• It has to be non-transitively dependent on every key of R.


OR we can say: A relation schema R is in 3NF if, whenever a non trivial functional dependency

Converting to 3NF:
Resolve transitive dependencies.

Create separate tables for each transitive dependency.

3NF Conversion Result:

Consider the above Example of managing building project

Pro_T:proNo-> proname

Emp_t: Empno ->empname, jobclass.

Works: preNo, Empno-> Hours

Job: Jobclass -> Chrghours.

Normalizing in to 2NF and 3NF Example:

The test for 2NF involves the testing for functional dependencies whose left hand attributes are part of
the primary key. The EMP_PROJ relation in the figure is in 1 NF but it is not in 2NF.The non-prime
attribute ENAME violates 2NF because of FD2, also PNAME, PLOCATION because of FD3. FD2 and FD3
make ENAME, PNAME and PLOCATION partially dependent on the primary key {SSN, PNUMBER} of
EMP_PROJ and thus violating 2NF. EMP_PROJ is decomposed in to EP1, EP2, and EP3 each of which is in
2NF as shown in Figure a.

a) Normalizing EMP_PROJ in to 2NF relations

EMP_PROJ

SSN PNUMBER HOURS ENAME PNAME PLOCATION

FD1

FD2

FD3
2NF NORMALIZTAION

EP1 EP2 EP3

SSN PNUMBER HOURS SSN ENAME PNUMBER PNAME PLOCATION

FD1 FD2 FD3

The relation schema EMP_DEPT in the following figure is in 2NF but not in 3NF because of transitive
dependency of DMGRSSN (also DNAME) on SSN via DNUMBER. EMP_PROJ is decomposed in to ED1 and
ED2, each of which is in 3NF as in Figure b.

Normalizing EMP_DEPT into 3NF relations

EMP_DEPT

ENAME SSN BDATE ADDRESS DNUMBER DNAME DMGRSSN


3NF NORMALISATION

ED1

ENAME SSN BDATE ADDRESS DNUMBER

ED2

DNUMBER DNAME DMGRSSN

- Boyce-Codd Normal Form (BCNF)

BCNF is a normal form used in database normalization. It is slightly stronger version of the 3NF.

A table is in BCNF if and only if:

(a) It is in 3NF and

(b) For every of its nontrivial functional dependency X.... Y, X is a super key.

OR A relation schema R is in BCNF if and only if a nontrivial functional dependency say, X.... A holds in R
then

X is a super key of R.

Converting to BCNF:
Each and every determinant in the table is a candidate key.
Has some characteristics as primary key, but for some reason, not chosen as primary key.

If a table contains at least one candidate key, the 3NF and BCNF are equivalent.

BCNF can only be violated if the table contains more than one candidate key.

Example:

Sid Status City Pid Quantity


S1 20 LONDON P1 300
S1 20 LONDON P2 200
S1 20 LONDON P3 400
S1 20 LONDON P4 200
S1 20 LONDON P5 100
S1 20 LONDON P6 100
S2 10 PARIS P1 300
S2 10 PARIS P2 400
S3 10 PARIS P2 200
S4 20 LONDON P2 200
S4 20 LONDON P4 300
S4 20 LONDON P5 400

2NF:

Supply: Sid, Status, City

Product: Sid, Pid, Quantity

Supply table:

Sid Status City


S1 20 LONDON
S2 10 PARIS
S3 10 PARIS
S4 290 LONDON
Product Table:

Sid Pid quantity


S1 P1 300
S1 P2 200
S1 P3 400
S1 P4 200
S1 P5 100
S1 P6 100
S2 P1 300
S2 P2 400
S3 P2 200
S4 P2 200
S4 P4 300
S4 P5 400

3NF:

Supply: Sid, Status

City: Status, City

Product: Sid, PID, Quantity

Supply Table:

Sid Status
S1 20
S2 10
S3 10
S4 10
City Table:

Status City
20 PARIS
20 LONDON

This is BCNF as ’City’ contains only one candidate key, ‘status’.

Example No.2

Sid Adv. Adv_room C1 C2 C3


1022 J 412 101-7 143-1 159-2
4123 S 216 201-1 211-2 214-1

2NF:

Sid# Adv. Advroom Class#


1022 J 412 101-7
1022 J 412 143-1
1022 J 412 159-2
4123 S 216 201-1
4123 S 216 211-2
4123 S 216 214-1
3NF:

Sid Cid Adv


1022 101-1 J
1022 143-1 J
1022 159-2 J
4123 201-1 S
4123 211-2 S
4123 214-1 s

Adv Adv_room
J 412
S 216

You might also like