Professional Documents
Culture Documents
What is Normalization?
• Database designed based on the E-R model may have some amount of
– Inconsistency
– Uncertainty
– Redundancy
To eliminate these draw backs some refinement has to be done on the database.
• Insert Anomaly
• Delete Anomaly
• Update Anomaly
• Data Duplication
Data Normalization
avoid
satisfies certain constraints that
–Insertion
–Deletion
–Update
ANOMALIES
• Insertion
– inserting one fact in the database requires knowledge of other facts unrelated
to the fact being inserted
• Deletion
– Deleting one fact from the database causes loss of other unrelated data from
the database
• Update
– Updating the values of one fact requires multiple changes to the database
TABLE: COURSE
Deletion:
Suppose not enough students enrolled for
the course CIS570 which had only one section
072. So, the school decided to drop this section
and delete the section# 072 for CIS570 from the
table COURSE. But then, what other relevant
info also got deleted in the process?
COURSE# SECTION# C_NAME
CIS564 072 Database Design
CIS564 073 Database Design
CIS570 072 Oracle Forms
CIS564 074 Database Design
ANOMALIES EXAMPLES
Update:
Suppose the course name (C_Name) for
CIS 564 got changed to Database Management.
How many times do you have to make this
change in the COURSE table in its current
form?
COURSE# SECTION# C_NAME
CIS564 072 Database Design
Slide 10- 12
Functional Dependency
• Relationship between columns X and Y such that, given the value of X, one
can determine the value of Y. Written as X Y
i.e., for a given value of X we can obtain (or look up) a specific value of X
Bordoloi
Functional Dependency
• Example
– SOC_SEC_NBR EMP_NME
– Reg_no->name
SOC_SEC_NBR EMP_NME
T2
A 22
B 33
C 34
B 33
Trivial functional dependency
• Rno,panno->name
Examples of FD constraints (1)
• Social security number determines employee name
– SSN -> ENAME
• Project number determines project name and location
– PNUMBER -> {PNAME, PLOCATION}
• Employee ssn and project number determines the hours per week that the
employee works on the project
– {SSN, PNUMBER} -> HOURS
Examples of FD constraints (2)
• An FD is a property of the attributes in the schema R
• The constraint must hold on every relation instance r(R)
• If K is a key of R, then K functionally determines all attributes in R
– (since we never have two distinct tuples with t1[K]=t2[K])
2.2 Inference Rules for FDs (1)
• Given a set of FDs F, we can infer additional FDs that hold
whenever the FDs in F hold
• Armstrong's inference rules:
– IR1. (Reflexive) If Y subset-of X, then X -> Y
– IR2. (Augmentation) If X -> Y, then XZ -> YZ
• (Notation: XZ stands for X U Z)
– IR3. (Transitive) If X -> Y and Y -> Z, then X -> Z
• The last three inference rules, as well as any other inference rules, can be
deduced from IR1, IR2, and IR3 (completeness property)
Inference Rules for FDs
• Given a set of FDs F, we can infer additional FDs
that hold whenever the FDs in F hold
• Armstrong's inference rules
A1. (Reflexive) If Y subset-of X, then X Y
A2. (Augmentation) If X Y, then XZ YZ
(Notation: XZ stands for X U Z)
A3. (Transitive) If X Y and Y Z, then X Z
• A1, A2, A3 form a sound and complete set of
inference rules
Additional Useful Inference Rules
• Decomposition
– If X YZ, then X Y and X Z
• Union
– If X Y and X Z, then X YZ
• Psuedotransitivity
– If X Y and WY Z, then WX Z
• Closure of a set F of FDs is the set F+ of all FDs
that can be inferred from F
NORMAL FORMS
1 NF
2NF
3NF
• BCNF (Boyce-Codd Normal Form)
• 4NF
• 5NF
• DK (Domain-Key) NF
Functional dependency
• In a given relation R, X and Y are attributes. Attribute Y is functionally
dependent on attribute X if each value of X determines EXACTLY ONE
value of Y, which is represented as X -> Y (X can be composite in nature).
• COURSE# CourseName,
• COURSE# IName (Assuming one course is taught by one and only one
Instructor)
• IName Room# (Assuming each Instructor has his/her own and non-
shared room)
• Marks Grade
Dependency diagram
Report( S#,C#,SName,CTitle,LName,Room#,Marks,Grade)
• S# SName
S# C#
• C# CTitle,
• C# LName
• LName Room# CTitle
• C# Room# SName
• S# C# Marks LName
• Marks Grade
• S# C# Grade Marks Grade
Room#
Assumptions:
• Each course has only one lecturer and each lecturer has a room.
• Grade is determined from Marks.
Full dependencies
Student#
Marks
Course#
Partial dependencies
X and Y are attributes.
Attribute Y is partially dependent on the attribute X only if it is dependent
on a sub-set of attribute X.
CourseName
Student#
IName
Course#
Room#
Transitive dependencies
– if and only if all the attributes of the relation R are atomic in nature.
– Atomic: the smallest level to which data may be broken down and remain
meaningful
• A table is in 1 NF iff:
1.There are only Single Valued Attributes.
2.Attribute Domain does not change.
3.There is a Unique name for every
Attribute/Column.
4.The order in which data is stored, does not matter.
Student_Course_Result Table
Applied
101 Davis 04-Nov-1986 M4 Mathematics Basic Mathematics 7 11-Nov-2004 82 A
Applied
102 Daniel 06-Nov-1986 M4 Mathematics Basic Mathematics 7 11-Nov-2004 62 C
Applied
104 Evelyn 22-Feb-1986 M4 Mathematics Basic Mathematics 7 11-Nov-2004 65 B
Example
• For an airline
Flight Weekdays
UA59 Mo We Fr
UA73 Mo Tu We Th Fr
1NF Solution
Flight Weekday
UA59 Mo
UA59 We
UA59 Fr
UA73 Mo
UA73 We
… …
Product_id color price
1 R,b,y 30
2 Y,w 40
3 B,w 400
Second normal form: 2NF
• A Relation is said to be in Second Normal Form if and only if :
– It is in the First normal form, and
– No partial dependency exists between non-key attributes and key
attributes.
To make a table 2NF compliant, we have to remove all the partial dependencies
Student Name
Date of Birth
CourseName
To make this table 2NF compliant, we have to remove all the partial
dependencies.
S# StudentName
Partial Dependency
S# DOB
C# CourseName
101 M4 82 A
102 M4 62 C
101 H6 79 B
103 C3 65 B
104 B3 77 B
102 P3 68 B
105 P3 89 A
103 B4 54 D
105 H6 87 A
104 M4 65 B
Second Normal form – Tables in 2 NF
Exam_Date Table
Course# DateOfExam
M4 11-Nov-04
H6 22-Nov-04
C3 16-Nov-04
B3 26-Nov-04
P3 12-Nov-04
B4 27-Nov-04
2NF
COURSE# SECTION#
CIS564 072
CIS564 073
CIS564 074
CIS570 072
COURSE
COURSE# C_NAME
CIS564 Database Design
CIS570 Oracle Forms
Third normal form: 3 NF
A relation R is said to be in the Third Normal Form (3NF) if and only if
It is in 2NF and
No transitive dependency exists between non-key attributes and
key attributes.
101 M4 82
102 M4 62
101 H6 79
103 C3 65
104 B3 77
102 P3 68
105 P3 89
103 B4 54
105 H6 87
104 M4 65
Third Normal Form – Tables in 3 NF
MARKSGRADE TABLE
UpperBound LowerBound Grade
100 95 A+
94 85 A
84 70 B
69 65 B-
64 55 C
54 45 D
44 0 E
3NF
A+ -> A,B,C,D
B+=B,C,D
C+={C,D}
D+={D}
ABCD and it can be denoted using A -> ABCD
A =ABCD
Consider a relation R ( A , B , C , D , E , F , G ) with the
functional dependencies-
A → BC
BC → DE
D→F
CF → G
A+ = { A }
= {A, B,C} ( Using A → BC )
= {A, B,C,D,E} ( Using BC → DE )
={A, B,C,D,E, F } ( Using D → F )
= { A , B , C , D , E , F , G } ( Using CF → G )
Thus,
A+ = { A , B , C , D , E , F , G }
Closure of attribute D-
D+ = { D }
= { D , F } ( Using D → F )
We can not determine any other attribute using attributes
D and F contained in the result set.
Thus,
D+ = { D , F }
Closure of attribute set {B, C}-
{ B , C }+= { B , C }
={B,C,D,E} ( Using BC → DE )
={B,C,D,E,F} ( Using D → F )
= { B , C , D , E , F , G } ( Using CF → G )
Thus,
{ B , C }+ = { B , C , D , E , F , G }
Candidate Key-
BCNF relation is a strong 3NF, but not every 3NF relation is BCNF.
A relation is said to be in Boyce Codd Normal Form (BCNF) if and only if all the determinants are candidate keys. BCNF relation
is a strong 3NF, but not every 3NF relation is BCNF.
Let us understand this concept using slightly different RESULT table structure. In the above table, we have two candidate keys
namely
r
STUDENT# COURSE# and COURSE# EmaiIId.
COURSE# is overlapping among those candidate keys.
Hence these candidate keys are called as
“Overlapping Candidate Keys”.
The non-key attributes Marks is non-transitively and fully functionally dependant on key
attributes. Hence this is in 3NF. But this is not in BCNF because there are four
determinants in this relation namely:
STUDENT# (STUDENT# decides EmailiD)
EMailID (EmailID decides STUDENT#)
STUDENT# COURSE# (decides Marks)
COURSE# EMailID (decides Marks).
All of them are not candidate keys. Only combination of STUDENT# COURSE# and
COURSE# EMailID are candidate keys.
student_id subject professor
103 C# P.Chash
101 Davis@myuni.edu M4 82
102 Daniel@myuni.edu M4 62
101 Davis@myuni.edu H6 79
103 Sandra@myuni.edu C3 65
104 Evelyn@myuni.edu B3 77
102 Daniel@myuni.edu P3 68
105 Susan@myuni.edu P3 89
103 Sandra@myuni.edu B4 54
105 Susan@myuni.edu H6 87
104 Evelyn@myuni.edu M4 65
BCNF
Candidate Keys for the relation are
S# C# and C# EmailID
S# C#
C#
C# EmailID
S#,C# Marks
C#,EMailID Marks
BCNF
STUDENT TABLE
Student# EmailID
101 Davis@myuni.edu
102 Daniel@myuni.edu
103 Sandra@myuni.edu
104 Evelyn@myuni.edu
105 Susan@myuni.edu
BCNF Tables
Student# Course# Marks
101 M4 82
102 M4 62
101 H6 79
103 C3 65
104 B3 77
102 P3 68
105 P3 89
103 B4 54
105 H6 87
104 M4 65
Merits of Normalization
3NF Relation should not have a non- Decompose and form a relation that
key attribute functionally includes the non-key attribute(s) that
determined by another non-key functionally determine(s) other non-key
attribute (or by a set of non-key attribute(s).
attributes). In other words there
should be no transitive
dependency of a non-key
attribute on the primary key.
Summary
• Normalization is a refinement process. It helps in removing anomalies present
in INSERTs/UPDATEs/DELETEs
• There are four normal forms that were defined being commonly used.
• 1NF makes sure that all the attributes are atomic in nature.
• While normalizing, use common sense and don’t use the normal forms as
absolute measures.
reg mn ho
1 M1,m2 H1,h2
2 m3 h4
Reg->->mn
1 m1 h1
1 m1 h2
1 m2 h1
Thank You!