You are on page 1of 11

NORMALISATION:- Normalisation is a schema refinement process.

It helps in removing anamolies


during insert, update and delete operations. Normalisation is a process of decomposing relations to
produce smaller and well defined relations.

NEED OF NORMALISATION:-Normalisation is a refinement approach based on decompositions.


Decompositions eliminates redundancy storage and maintains data consistency.Redundent storage of the
information is the root cause of this problems.

PROBLEMS CAUSED BY REDUNDENCY:-Storing the information redundantly,that is more than in


one place with in a database can lead to several problems.By looking at the following table which is in
unnormalised form we can get the brief idea on anamolies which occurs.

STUDENT DETAILS:

S.NO S.NAME SUBJECT MARKS GRADE


501 VIKRANTH DBMS 80 A
502 ABHISHEK CO 60 C
562 PRAVEEN JAVA 70 B
501 VIKRANTH FLAT 60 C
502 ABHISHEK ES 50 F
562 PRAVEEN DAA 70 B
REDUNDENT STORAGE:Some information stored repeatedly is known as a redundant storage.In the
above tables S.NO 501,502,562 had repeated twice and marks 70,60 with there grades B,C is also
repeated,is an example of redundant storage

UPDATE ANAMOLIES: If one copy of such repeated data is updated,an inconsistency is created
unless all copies are similarly updated.When we tend to update the grade as B for 60 marks in the record
of S.NO 502 then we need to make the similar update for S.NO 501 otherwise the data will be in
inconsistent state

INSERTION ANAMOLIES: It may not be possible to store certain information unless some other,
unrelated, information is stored as well .In the above table assume we have only marks and grade
information. If we need to insert the information in above table then we should insert the null values in
the remaining fields of corresponding marks grade fields But S.NO is a primary key which doesnot
allow null values and duplicate values and hence inserting an null value creates an anamoly.

DELETION ANAMOLIES: It may not be possible to delete certain information without loosing some
other, unrelated, information as well. When we want to delete the record 502 ,the information marks 15
with F grade will be deleted is not repeated in any other record. When we need to insert the information
with marks 15 later, the grade information wont be available, which certainly causes anamoly.

DECOMPOSITION:A decomposition of a relation schema R consists of replacing the relation schema


by few (or more) relation schemas that each contain a subset of the attributes of R and together include
all attributes in R.Intuitively,we want to store the information in any given instace of R by storing
projections of the instance.

We can decompose the above STUDENT_DETAILS into two relations. 1. Student


(sno,sname,subject,marks) 2. Marks(marks,grade)

STUDENT :
S.NO S.NAME SUBJECT MARKS
501 VIKRANTH DBMS 80
502 ABHISHEK CO 60
562 PRAVEEN JAVA 70
501 VIKRANTH FLAT 60
502 ABHISHEK ES 50
562 PRAVEEN DAA 70
MARKS:

MARKS GRADE
80 A
70 B
60 C
15 F
 Now when we need to update the C grade as B with marks 60,it is enough to update once in
marks table. But before decomposition we did two updates
 When we need to delete the record 502 the information of marks 15 with grade F will not be
deleted.

TERMS USED IN NORMALISATION:-

Before going to study about normal forms we must familiar with the following terms

1)FUNCTIONAL DEPENDENCY:-Formally we can define functional dependency as ,In a given


relation R ,X and Y are attributes. Attribute Y is functionally dependent on attribute X if each value of X
determines exactly one value of Y this is represented as X -> Y.

Example:-In the above marks table the values of grade determined by the marks and hence marks
determines the values for grade .This dependency can be represented as marks -> grade and read as marks
determines grade.

DETERMINENT:-Attribute X can be defined as determinant if it uniquely determines the attribute value


in a given relation ship.To qualify as determinant attribute need not be a key attribute .Usally dependency
of an attribute is represented as X -> Y which means attribute X determines attribute Y.

Example:-In the marks table marks attribute deciede the grade attribute.This is represented as marks->
grade and read as marks deciedes grade.

FULL FUNCTIONAL DEPENDENCY:-Formal definition of full functional dependency is in a given


relation R, X and Y are attributes, Y is full functional dependent on attribute X only if it is not functional
dependent on subset of X , however X may be in composite nature.Example:- consider the following
relation REPORT with Studentno, Courseno, Coursename, Iname, Mark, Grade as attributes.
Studentno,Courseno togher called as composite attributes and they defines exactly one value of marks.this
can be symbolically represented as Studentno, Courseno -> Marks

Here Marks depends on both Studentno, Courseno but not on one of the attributes.

PARTIAL FUNCTIONAL DEPENDENCY:-Formal definition of partial functional dependency is in a


given relation R,X and Y are attributes,attribute Y is partially dependent on the attribute X only if it is
dependent on subset of attribute X .However X may be composite in nature.
Consider the relation sailors-boats where SID,BID,Sname,age,Bcolor are attributes.When we take the
relation SID,BIDBcolor. Bcolor is partially functional dependent on SID and BID because here Bcolor
is dependent on the BID but not on SID

TRANSITIVE FUNCTIONAL DEPENDENCY:-

A functional dependency XY in a relation schema R is a transitive dependency if there is a set of


attributes that is neither a candidate key nor a subset of any key of R and both XZ and ZY satisfies

EXAMPLE

For example consider the relation SCR

Sno S name D.O.B Course no Course name Marks Grade


580 Sasi 09-06-92 Co-5 Co 62 C
582 Sandeep 08-02-92 Oops-6 Oops 72 B
588 Chandhu 02-02-92 Flat-4 Flat 68 B
594 Abidh 03-03-92 Es-3 Es 82 A
595 Aman 04-04-92 Daa-2 Daa 70 B

Here sno.courseno can derive one grade value at a time so

Sno,Coursenograde is a full functional dependency

There exists a key marks such that it is not a candidate key and not a subset of any key and

Sno,Coursemarks

Marksgrade also satisfies

MULTIVALUED DEPENDENCY : Let R be a relation , A and B are subsets of R there’s multivalued


dependency from AB if and only if each A value exactly determines the set of B values. Suppose we
have a relation with attribute course, teacher and book which we denote as CTB. The meaning of the tuple
is that teacher T can teach course C and book B is recommended text for the course.

There are no FD’s the key is CTB . however ,the recommended texts for a course are independent of the
instructor

Course Teacher Book


Daa Ramesh Coreman
Daa Ramesh Ellis horowitz
Daa Vikram Core man
Daa Vikram Ellis horowitz
Dbms Vijay Raghuramakrishna
Dbms Sharma Korth
Dbms Vijay Raghuramakrishna
Dbms Sharma Korth

Here the relation has two multi valued dependencies course teacher, course book. They can be
represented as courseteacher, coursebook. This is read as teacher is multi valued dependent on
course or course multi determinant teacher. Book is multi valued dependent on course or course multi
determinant on book.

Due to the multi valued dependency data is stored repeatedly. The redundancy data is stored repeatedly.
The redundancy can be eliminated by decomposing CTB in to CT and CB.

CT

Course Teacher
Daa Ramesh
Daa Vikram
Dbms Vijay
Dbms sharma

CB

Course Book
Daa Coreman
Daa Ellis horowitz
Dbms Raghuramakrishna
Dbms korth

The redundancy is eliminated to some extent

CONDITIONS FOR MULTI VALUED DEPENDENCY’S

If the multi valued dependency XY holds over R and Z=R-XY the following must be true for every
legal instance r of R

If t1 Є r, t2 Є r and t1.X=t2.X, then there must be some t3 Є r such that t1.XY=t3.XY and t2.Z=t3.Z

To show the above condition consider the following example.

X Y Z
A b1 cI Tuple t1
A b2 c2 Tuple t2
A b1 c2 Tuple t3
A b2 c1 Tuple t4

Whenever we interchanges the t2 and t1 tuple then we decide that the tuple t4 must also be in the relation
instance.

SUPER KEY:An attribute ( or combination of attributes) that uniquely identify each row in a table is
called super key

CANDIDATE KEY: A minimal super key or a super key that does not contain a subset of attributes i.e.,
itself a super key
PRIMARY KEY: A candidate key selected to uniquely identify all other attribute values in any given
row. It does not contain any duplicate or null values.

FOREIGN KEY: An attribute or combination of attributes in one table whole values must either match
the primary key in another table

NORMAL FORMS: Given a relation schema, we need to decide whether it is a good design or we need
to decompose it in to smaller relations. Such a decision must be guided by an understanding of what
problems, if any, arise from the current schema. To provide such guidance several normal forms have
been proposed. If a relation schema is in one of these normal forms, we know that certain kinds of
problems cannot arise.

STEPS IN NORMALISATION: Normalization can be accomplished and understood in steps and each
step results to a normal form. The different types of normal forms are as follows

FIRST NORMAL FORM:Any multi valued attributes (also called repeating loops) have been removed,
so there is a single value at the intersection of each row and column of the table Any relation schema is
said to be in 1-NF if the value in the domain of each attribute of the relation are atomic. In another words
only one is associated with each attribute and the value of that attribute is not a set of values or a list of
values.

BEFORE 1-NF

S no S name Course no Course name Marks Grade


580 Sasi Co-5, dbms-1 Co, dbms 62,82 C,A
582 Sandeep Oops-6, daa-2 Oops, daa 72,74 B,B
588 Chandhu Flat-4, co-5 Flat, co 70,70 B,B
594 Abidh Es-3 Es 82 A
595 Aman Daa-2 Daa 70 B
Here in this relation, the fields course no, course name, marks contain multi values.

AFTER 1-NF

S no S name Course no Course name Marks Grade


580 Sasi Co-5 Co 62 C
582 Sandeep Oops-6 Oops 72 B
588 Chandhu Flat-4 Flat 70 B
594 Abidh Es-3 Es 82 A
595 Aman Daa-2 Daa 70 B
580 Sasi Dbms-1 Dbms 82 A
582 Sandeep Daa-2 Daa 74 B
588 Chandhu Co-5 C0 70 B

DRAWBACK: The main drawback of 1-NF is redundancy of data.

SECOND NORMAL FORM: A relation is in 2-NF if and only if it is in 1-NF and every non-key column
depends up on a key-column, not a subset of key.

In the above table student details, if we take the s no, course no, marks as key and
sno,courseno,marksmarks is the relation and in the relation marks on right hand side is a non key
column which depends on key sno, courseno, marks on the left hand side.
But marks is the subset of the key sno,courseno,marks this must not exist. The information related to
sno,courseno is enough to determine marks. All non-prime attributes of R must be Full Functional
Dependency on whole keys of the relation not a part of the key.

EXAMPLE

Sno,coursenomarks,grade

Snomarks,grade

Here the fields marks,grade Full Full dependent on sno and course where as marks,grade is not full
functional dependent on sno.

Every non primary key is Full Functional Dependency on the primary key

2-NF is based on the concept of full Functional Dependency and removal of the partial functional
dependency.

EXAMPLE :BEFORE 2-NF

S no s name Course no Course name marks Grade


580 Sasi Co-5 Co 62 C
582 Sandeep Oops-6 Oops 72 B
588 Chandhu Flat-4 Flat 70 B
594 Abidh Es-3 Es 82 A
595 Aman Daa-2 Daa 70 B
580 Sasi Dbms-1 Dbms 82 A
582 Sandeep Daa-2 Daa 74 B
588 Chandhu Co-5 C0 70 B

In the relation student_relation, the key attribute is sno and non key attribute is sname.In the second table
course, courseno is the key attribute and all other non-key attributes is course name.

In the third table result sno,courseno together are key attributes and all the other non key attributes are
marks,grade. Marks and grade are Full Functional Dependent on sno,courseno.All the non prime
attributes of R must be full Function dependent on whole keys of the R, not a part of the key.

AFTER 2-NF

TABLE-1

SNO SNAME
580 Sasi
582 Sandeep
588 Chandhu
594 Abidh
595 Aman
TABLE-2

Course no Course name


Co-5 Co
Oops-6 Oops
Flat-4 Flat
Es-3 Es
Daa-2 Daa
Dbms-1 Dbms
TABLE-3

S no Course no Marks Grade


580 Co-5 62 C
580 Dbms-1 82 A
582 Oops-6 72 B
582 Daa-2 74 B
588 Flat-4 73 B
588 Co-5 70 B
594 Es-3 80 A
595 Daa-2 71 B

THIRD NORMAL FORM

A relation is in 3-NF if

It is in 2-NF

It contains no transitive dependency

A relation R is in 3-NF if and only if it is in 2-NF and every non key column doesn’t depend on another
non key column.

No transitive dependency exist between non key attribute and key attribute.

In simple words 3-NF all Full Dependency’s every left hand attribute should be compulsory a primary
key.

Sno,coursenomarksgrade

Grade depends on marks in turn marks depend on sno,courseno hence grade fully transitive dependent on
sno,courseno.

By taking an example we get brief idea about 3-NF

BEFORE 3-NF

S NO Course no Marks Grade


580 Co-5 62 C
582 Oops-6 72 B
588 Flat-4 70 B
594 Es-3 82 A
595 Daa-2 70 B
580 Dbms-1 82 A
582 Daa-2 74 B
588 Co-5 70 B
AFTER 3 NF:-
S.NO Course no Marks
580 Co-5 62
582 Oops 72
588 Flat-4 70
594 Es-3 82
595 Daa-2 70
580 Dbms-1 82
582 Daa-2 70
588 Co-5 70

Marks Grade
62 C
70 B
72 B
82 A

BOYCECODD NORMAL FORM

Let R be a relation schema, F be the set of functional dependencies given to hold over R, X be a subset of
the attributes of R and A be an attribute of R. R is in BOYCE CODD NORMAL FORM if, for every
functional dependency XA in F, one of the following statement is true.

A Є X; i.e., it is a trival functional dependency or

X is a super key

When we goes to 3-NF, a relation schema is in 3-NF if whenever a non trival functional dependency
XA holds in R then the following statement is true.

A Є X, i.e., it is a trival functional dependency or

X is a super key

A is a prime attribute of R

The definition of BCNF differs slightly from 3-NF. The only difference between BCNF and 3-NF is that
condition to 3rd of 3-NF which allows A to be prime, is absent in BCNF.

BCNF removes the remaining anamolies that results from functional dependencies.

In BCNF the attributes are left hand side of Functional dependencies must be candidate key. The relation
is said to be in BCNF if and only if all determinants are candidate keys.

BCNF relation is strong 3-NF but not every 3-NF relation is BCNF

EXAMPLE

consider the student_advisor relation which is 3-NF but not BCNf as shown

Sid Dept Advisor Dep Gpa


202 Electrical Sasi 9.0
402 Electronics Kiran 9.5
500 Computers Sandeep 10.0
1200 IT Praveen 9.6

The student_advisor relation holds following functional dependencies.

FUNCTIONAL DEPENDENCY-1

Sid,dept advisor,deptGpa

FUNCTIONAL DEPENDENCY-2

Advisordept

We conclude that the above relations is in 3-NF but not in BCNF

The candidate keys in the above table are

1. (sid dept)
2. (sid dept advisor)
3. (sid dept dept Gra)

Clearly advisor is not a candidate key

Consider the following example

X Y A
X y1 A
X y2 ?

When we take the functional dependency XA then in the first tuple for value x the value of A is a
then the second tuple also contains same value i.e., A as a because X value is also x.

Here the value of Y1=Y2 since X is a key

Therefore the first tuple and second tuple has same values in all field that is data is repeated.

Such situation cannot happens in BCNF if a relation is in BCNF every tuple records a piece of
information that cannot be inferred from the values in all other fields in the relation instance.

FOURTH NORMAL FORM:-

Fourth normal form is a direct generalization of BCNF. Let R be relation schema, x and Y be non empty
subsets of the attributes of R, and F be set of dependencies that includes both FDs and MVDs. R is said
to be in FOURTH NORMAL FORM, if for every MVD x->->y that holds over R, one of the following
statements is true:

Y subset or equal to X or XY = R, or

X is a super key.
In reading this definition, it is important to understand that the definition of a key has not changed the key
must uniquely determine all attributes through FDs alone. X->->Y is a trivial MVD if Y subset or equal
to X subset or equal to R or XY =R : such MVDs always hold.

Consider a relation schema ABCD and suppose that the FD ABCD and the MVD B->->C are given.
Considering only these dependencies, this relation schema appears to be a counter example to the result.
The relation has a simple key, appears to be in BCNF, and yet it is not in 4NF because B->->C causes a
violation of the 4NF conditions, let us take a closer look.

B C A D
B c1 a1 d1 Tuple t1
B c2 a2 d2 Tuple t2
B c1 a2 d2 Tuple t3
The table shows that three tuples from an instance of ABCD that satisfies the given MVD, given tuples t1
and t2, it follows that tuple t3 must also be included in the instance. Consider tuples t2 and t3. From the
given FD ABCD and the fact that these tuples have the same A-value, we can derive that c1=c2.
Therefore we see that FD BC must hold over ABCD whenever the FD ABCD and the MVD B->-
>C hold. If B->->C holds, the relation ABCD is not in BCNF.

Denormalization:

Denormalization is the process of attempting to optimize the read performance of a database by adding
redundant data or by grouping data. In some cases, denormalization helps cover up the inefficiencies
inherent in relational database software. A relational normalized database imposes a heavy access load
over physical storage of data even if it is well tuned for high performance.

A normalized design will often store different but related pieces of information in separate logical tables
(called relations). If these relations are stored physically as separate disk files, completing a database
query that draws information from several relations (a join operation) can be slow. If many relations are
joined, it may be prohibitively slow. There are two strategies for dealing with this. The preferred method
is to keep the logical design normalized, but allow the database management system (DBMS) to store
additional redundant information on disk to optimize query response. In this case it is the DBMS
software's responsibility to ensure that any redundant copies are kept consistent. This method is often
implemented in SQL as indexed views (Microsoft SQL Server) or materialized views (Oracle). A view
represents information in a format convenient for querying, and the index ensures that queries against the
view are optimized.

The more usual approach is to denormalize the logical data design. With care this can achieve a similar
improvement in query response, but at a cost—it is now the database designer's responsibility to ensure
that the denormalized database does not become inconsistent. This is done by creating rules in the
database called constraints, that specify how the redundant copies of information must be kept
synchronized. It is the increase in logical complexity of the database design and the added complexity of
the additional constraints that make this approach hazardous. Moreover, constraints introduce a trade-off,
speeding up reads (SELECT in SQL) while slowing down writes (INSERT, UPDATE, and DELETE).
This means a denormalized database under heavy write load may actually offer worse performance than
its functionally equivalent normalized counterpart.

A denormalized data model is not the same as a data model that has not been normalized, and
denormalization should only take place after a satisfactory level of normalization has taken place and that
any required constraints and/or rules have been created to deal with the inherent anomalies in the design.
For example, all the relations are in third normal form and any relations with join and multi-valued
dependencies are handled appropriately.

Examples of denormalization techniques include:

 Materialized views, which may implement the following:


o Storing the count of the "many" objects in a one-to-many relationship as an attribute of
the "one" relation
o Adding attributes to a relation from another relation with which it will be joined

You might also like