You are on page 1of 60

Normalization

Badgujar Dipak D
Integrity Constraints
• Integrity constraints are a set of rules. It is used to maintain the quality of
information.
• Integrity constraints ensure that the data insertion, updating, and other
processes have to be performed in such a way that data integrity is not
affected.
• Thus, integrity constraint is used to guard against accidental damage to the
database.
Integrity Constraints
• Types of Integrity Constraints:
Domain Constraints
• Domain constraints can be defined as the definition of a valid set of values
for an attribute.
• The data type of domain includes string, character, integer, time, date,
currency, etc. The value of the attribute must be available in the
corresponding domain.
Entity Integrity Constraints
• The entity integrity constraint states that primary key value can't be null.
• This is because the primary key value is used to identify individual rows in
relation and if the primary key has a null value, then we can't identify those
rows.
• A table can contain a null value other than the primary key field.
Referential Integrity Constraints
• A referential integrity constraint is specified between two tables.
• In the Referential integrity constraints, if a foreign key in Table 1 refers to
the Primary Key of Table 2, then every value of the Foreign Key in Table 1
must be null or be available in Table 2.
Key Constraints
• Keys are the entity set that is used to identify an entity within its entity set
uniquely.
• An entity set can have multiple keys, but out of which one key will be the
primary key. A primary key can contain a unique and not null value in the
relational table.
Keys in DBMS
• Super Key – A super key is a group of single or multiple keys which identifies
rows in a table.
EmpSSN EmpNum Empname
9812345098 AB05 Shown
9876512345 AB06 Roslyn
199937890 AB07 James

• In the above-given example, EmpSSN and EmpNum name are superkeys.


Keys in DBMS
• Primary Key – in DBMS is a column or group of columns in a table that uniquely
identify every row in that table. The Primary Key can’t be a duplicate meaning the
same value can’t appear more than once in the table. A table cannot have more
than one primary key.
• Two rows can’t have the same primary key value
• It must for every row to have a primary key value.
• The primary key field cannot be null.
• The value in a primary key column can never be modified or updated if any foreign key
refers to that primary key.
StudID Roll No First Name LastName Email
abc@gmail.co
1 11 Tom Price
m
xyz@gmail.co
2 12 Nick Wright
m
mno@yahoo.c
3 13 Dana Natan
om
• In the above-given example, StudID is PK.
Keys in DBMS
• Alternate Key – is a column or group of columns in a table that uniquely
identify every row in that table.
• A table can have multiple choices for a primary key but only one can be set
as the primary key. All the keys which are not primary key are called an
Alternate Key.
StudID Roll No First Name LastName Email
abc@gmail.co
1 11 Tom Price
m
xyz@gmail.co
2 12 Nick Wright
m
mno@yahoo.c
3 13 Dana Natan
• In this table, StudID, Roll No, Email are qualified to become a om key. But
primary since
StudID is the primary key, Roll No, Email becomes the alternative key.
Keys in DBMS
• Candidate Key- in SQL is a set of attributes that uniquely identify tuples in a
table. Candidate Key is a super key with no repeated attributes. The Primary
key should be selected from the candidate keys. Every table must have at
least a single candidate key. A table can have multiple candidate keys but
only a single primary key.
• It must contain unique values
• Candidate key in SQL may have multiple attributes
• Must not contain null values
• It should contain minimum fields to ensure uniqueness
• Uniquely identify each record in a table
Keys in DBMS
StudID Roll No First Name LastName Email
abc@gmail.co
1 11 Tom Price
m
xyz@gmail.co
2 12 Nick Wright
m
mno@yahoo.c
3 13 Dana Natan
• In the given table Stud ID, Roll No, and email are candidateom
keys which help us to
uniquely identify the student record in the table.
Keys in DBMS
• Compound Key- has two or more attributes that allow you to uniquely recognize a
specific record. It is possible that each column may not be unique by itself within the
database. However, when combined with the other column or columns the
combination of composite keys become unique. The purpose of the compound key
in database is to uniquely identify each record in the table.
OrderNo PorductID Product Name Quantity
B005 JAP102459 Mouse 5
B005 DKT321573 USB 10
B005 OMG446789 LCD Monitor 20
B004 DKT321573 USB 15
B002 OMG446789 Laser Printer 3

• In this example, OrderNo and ProductID can’t be a primary key as it does not uniquely identify
a record. However, a compound key of Order ID and Product ID could be used as it uniquely
identified each record.
Keys in DBMS
• Composite Key- is a combination of two or more columns that uniquely
identify rows in a table. The combination of columns guarantees uniqueness,
though individually uniqueness is not guaranteed. Hence, they are combined
to uniquely identify records in a table.
• The difference between compound and the composite key is that any part of
the compound key can be a foreign key, but the composite key may or maybe
not a part of the foreign key.
Keys in DBMS
• Surrogate Keys is An artificial key which aims to uniquely identify each record
is called a surrogate key. This kind of partial key in dbms is unique because it is
created when you don’t have any natural primary key. They do not lend any
meaning to the data in the table. Surrogate key in DBMS is usually an integer.
A surrogate key is a value generated right before the record is inserted into a
table.
Fname Lastname Start Time End Time
Anne Smith 09:00 18:00
Jack Francis 08:00 17:00
Anna McLean 11:00 20:00
Shown Willam 14:00 23:00

• Above, given example, shown shift timings of the different employee. In this
example, a surrogate key is needed to uniquely identify each employee.
Keys in DBMS
Primary Key Foreign Key
Helps you to uniquely identify a record It is a field in the table that is the
in the table. primary key of another table.
A foreign key may accept multiple null
Primary Key never accept null values.
values.
A foreign key cannot automatically
Primary key is a clustered index and
create an index, clustered or
data in the DBMS table are physically
non-clustered. However, you can
organized in the sequence of the
manually create an index on the
clustered index.
foreign key.
You can have the single Primary key in You can have multiple foreign keys in a
a table. table.
Anomalies in DBMS
• There are three types of anomalies that occur when the database is not
normalized. These are
• Insertion
• Updation
• Deletion
Emp_Id Emp_Name Emp_Address Emp_Dept
101 Dipak Jalgaon D101
101 Dipak Jalgaon D102
123 Aniket Pune D105
166 Shreya Nasik D108
166 Shreya Nasik D109
Anomalies in DBMS
• Update Anomalies: In the table we have two rows for employee Dipak who
works for 2 departments of company. If we update the address of Dipak then
we need to update it in two rows if not Done then data will be inconsistent.
Emp_Id Emp_Name Emp_Address Emp_Dept
101 Dipak Jalgaon D101
101 Dipak Jalgaon D102
123 Aniket Pune D105
166 Shreya Nasik D108
166 Shreya Nasik D109
• Insert Anomalies: If the new employee has joined as trainee if he has not
allocated the dept. We can’t insert his information if emp_dept is not
accepting null values.
• Delete Anomalies: If company want to close the operation of Dept D105 then
we will lost the information of employee 123 as he is the single employee
working in that dept.
Anomalies in DBMS
• To overcome these anomalies we need to normalize the data through
Normalization Process.
Functional Dependency
• The functional dependency is a relationship that exists between two attributes. It
typically exists between the primary key and non-key attribute within a table.
• Functional dependency helps you to maintain the quality of data in the database.
• A functional dependency is denoted by an arrow →.
• The functional dependency of X on Y is represented by X → Y.
• Functional Dependency plays a vital role to find the difference between good and
bad database design.
• One of the attributes is called the determinant and the other attribute is called the
determined. For each value of the determinant there is associated one and only
one value of the determined.
• If A is the determinant and B is the determined then we say that A functionally
determines B and graphically represent this as A -> B
Types of Functional Dependency
• Trivial functional dependency
• A → B has trivial functional dependency if B is a subset of A.
• The following dependencies are also trivial like: A → A, B → B
• Example
• Consider a table with two columns Employee_Id and Employee_Name.
• {Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency as
• Employee_Id is a subset of {Employee_Id, Employee_Name}.
• Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are trivial
dependencies
• Non-trivial functional dependency
• A → B has a non-trivial functional dependency if B is not a subset of A.
• When A intersection B is NULL, then A → B is called as complete non-trivial.
• Example
• ID → Name,
• Name → DOB
Types of Functional Dependency
• Multivalued dependency
• Multivalued dependency occurs in the situation where there are multiple independent
multivalued attributes in a single table. A multivalued dependency is a complete constraint
between two sets of attributes in a relation. It requires that certain tuples be present in a
relation.
Car_Model Manufacture_Year Color
H1001 2017 Metallic
H1001 2017 Grey
H1005 2018 Blue
H1005 2018 Red
H166 2015 Silver

• In this example, maf_year and color are independent of each other but dependent on
car_model. In this example, these two columns are said to be multivalue dependent on
car_model.
• Functional Dependency is Represented as
• car_model -> maf_year and
• car_model-> colour
Types of Functional Dependency
• Transitive Dependency:
• A functional dependency is said to be transitive if it is indirectly formed by two functional
dependencies.
• X -> Z is a transitive dependency if the following three functional dependencies hold true:
• X->Y
• Y does not ->X Book Author Author_age
• Y->Z
Game of Thrones George R. R. Martin 66
Harry Potter J. K. Rowling 49
Dying of the Light George R. R. Martin 66

• A transitive dependency can only occur in a relation of three of more attributes. This
dependency helps us normalizing the database in 3NF (3rd Normal Form).
• {Book} ->{Author} (if we know the book, we knows the author name)
• {Author} does not ->{Book}
• {Author} -> {Author_age}
• Therefore as per the rule of transitive dependency: {Book} -> {Author_age} should hold, that
makes sense because if we know the book name we can know the author’s age.
Types of Functional Dependency
• Fully Functional Dependency:
• If X and Y are an attribute set of a relation , Y is fully functional dependent on X, if Y is
functionally dependent on X but not on any proper subset of X.
• Example –
In the relation ABC->D , attribute D is fully functional dependent on ABC if it is fully functional
dependent on ABC and not on any proper subset of ABC. That means that subsets of ABC like AB ,BC
,A,B ,etc cannot determine D .
supplier_id item_id price
From the table, we can clearly see that neither
1 1 540
supplier_id nor item_id can uniquely determine
price but both supplier_id and item_id together can 2 1 545
do so. So we can say that price is fully functionally 1 2 200
dependent on { supplier_id , item_id } .
Hence Fully Functional Dependency: 2 2 201

{ supplier_id , item_id }🡪 Price 1 1 540

2 2 201

3 1 542
Types of Functional Dependency
• Partial Functional Dependency :
• A functional dependency X->Y is a partial dependency if Y is functionally dependent on X
and Y cannot be determined by any proper subset of X.

name roll_no course

Ravi 2 DBMS

Tim 3 OS

John 5 Java

• Here , we can see that both the attributes name and roll_no alone is able to uniquely
identify a course. Hence , we can say that the relationship is partial dependent .
Difference
Full Functional Dependency Partial Functional Dependency

A functional dependency X->Y is a fully functional dependency A functional dependency X->Y is a partial dependency if Y is
if Y is functionally dependent on X and Y is not functionally functionally dependent on X and Y cannot be determined by
dependent on any proper subset of X. any proper subset of X.

In full functional dependency , the non-prime attribute is In partial functional dependency , non-prime attribute is
functionally dependent on the candidate key. functionally dependent on part of a candidate key.

In fully functional dependency, if we remove any attribute of In partial functional dependency, if we remove any attribute
X, then the dependency will not exist anymore. of X, then the dependency will still exist .

Partial Functional Dependency does not equate to the


Full Functional Dependency equates to the normalization
normalization standard of Second Normal Form . Rather , 2NF
standard of Second Normal Form .
eliminates the Partial Dependency.

An attribute A is fully functional dependent on another An attribute A is partially functional dependent on other
attribute B if it is functionally dependent on that attribute , attribute B if it is functionally dependent any part (subset) of
and not on any part (subset) of it . that attribute .

Partial dependency does not enhance the data quality. It must


Functional dependency enhances the quality of the data in
be eliminated in order to normalize in the second normal
our database.
form.
Advantages of Functional Dependency
• Functional Dependency avoids data redundancy. Therefore same data do not
repeat at multiple locations in that database
• It helps you to maintain the quality of data in the database
• It helps you to defined meanings and constraints of databases
• It helps you to identify bad designs
• It helps you to find the facts regarding the database design
Inference Rules
• The Armstrong's axioms are the basic inference rule.
• Armstrong's axioms are used to conclude functional dependencies on a
relational database.
• The inference rule is a type of assertion. It can apply to a set of FD(functional
dependency) to derive other FD.
• Using the inference rule, we can derive additional functional dependency from
the initial set.
• The Functional dependency has 6 types of inference rule
Inference Rules
•Reflexive Rule (IR1)
• In the reflexive rule, if Y is a subset of X, then X determines Y.
• If X ⊇ (Superset) Y then X → Y
• Example
• X = {a, b, c, d, e}
• Y = {a, b, c}
•Augmentation Rule (IR2)
• The augmentation is also called as a partial dependency. In augmentation, if
X determines Y, then XZ determines YZ for any Z.
• If X → Y then XZ → YZ
• Example R(ABCD), if A → B then AC → BC
Inference Rules
• Transitive Rule (IR3)
• In the transitive rule, if X determines Y and Y determine Z, then X must also determine Z.
• If X → Y and Y → Z then X → Z
• Union Rule (IR4)
• Union rule says, if X determines Y and X determines Z, then X must also determine Y and
Z.
• If X → Y and X → Z then X → YZ
X → Y (given)
X → Z (given)
X → XY (using IR2 on 1 by augmentation with X. Where XX = X)
XY → YZ (using IR2 on 2 by augmentation with Y)
X → YZ (using IR3 on 3 and 4)
Inference Rules
• Decomposition Rule (IR5)
• Decomposition rule is also known as project rule. It is the reverse of union rule.
• This Rule says, if X determines Y and Z, then X determines Y and X determines Z
separately.
• X → YZ (given)
• YZ → Y (using IR1 Rule)
• X → Y (using IR3 on 1 and 2)
• Pseudo transitive Rule (IR6)
• In Pseudo transitive Rule, if X determines Y and YZ determines W, then XZ determines W.
If X → Y and YZ → W then XZ → W
• X → Y (given)
WY→ Z (given)
WX → WY (using IR2 on 1 by augmentation with W)
WX → Z (using IR3 on 3 and 4)
Functional Dependency
• Functional Dependencies in a relation are dependent on the domain of the relation.
Consider the STUDENT relation given in Table 1.
• We know that STUD_NO is unique for each student.
• So STUD_NO->STUD_NAME, STUD_NO->STUD_PHONE, STUD_NO->STUD_STATE,
STUD_NO->STUD_COUNTRY and STUD_NO -> STUD_AGE all will be true.
• Similarly, STUD_STATE->STUD_COUNTRY will be true as if two records have same
STUD_STATE, they will have same STUD_COUNTRY as well.
• For relation STUDENT_COURSE, COURSE_NO->COURSE_NAME will be true as two
records with same COURSE_NO will have same COURSE_NAME.
Functional Dependency
• Functional Dependency Set: Functional Dependency set or FD set of a relation is the set
of all FDs present in the relation. For Example, FD set for relation STUDENT shown in
table :
{ STUD_NO->STUD_NAME, STUD_NO->STUD_PHONE, STUD_NO->STUD_STATE,
STUD_NO->STUD_COUNTRY, STUD_NO -> STUD_AGE, STUD_STATE->STUD_COUNTRY }
Functional Dependency
• Attribute Closure: Attribute closure of an attribute set can be defined as set
of attributes which can be functionally determined from it.
• To find attribute closure of an attribute set:
• Add elements of attribute set to the result set.
• Recursively add elements to the result set which can be functionally determined
from the elements of the result set.
• Attribute Closure is represent as A+
• (STUD_NO)+ = {STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_COUNTRY,
STUD_AGE}
• (STUD_STATE)+={STUD_STATE,STUD_COUNTRY}
Finding Candidate & Super Key From Closure
• If attribute closure of an attribute set contains all attributes of relation, the
attribute set will be super key of the relation.
• If no subset of this attribute set can functionally determine all attributes of
the relation, the set will be candidate key as well, which is also Minimal
Super Key
• Attributes which are parts of any candidate key of relation are called as
prime attribute, others are non-prime attributes.
Finding Closure of Relation
• Given Relation R(A,B,C,D,E)
• Functional Dependencies are (A🡪B,B🡪C,C🡪D,D🡪E)
• Sol: A🡪C Transitivity, A🡪D Transitivity A🡪A Reflexivity, A🡪E Reflexivity
• A🡪 ABCDE Union Rule
• X+ Closure of Attribute which contains all attributes which determines by X. X may
single attribute or set of attributes.
• A+={A,B,C,D,E} B+={B,C,D,E} C+={C,D,E}, D+={D,E}, E+={E}
• AD+={A,B,C,D,E} , AB+={A,B,C,D,E}, ABC+={A,B,C,D,E}
• BCD+={B,C,D,E}
• The Super keys are those keys which has all attributes on RHS
• Hence The Super Key is Any combination where A is on LHS.
Finding Closure of Relation
•Total Super Keys are 2^4=16.
•Candidate Key is A.
•AD+={A,B,C,D,E} is not candidate key because the subset of
AD+ is {A},{D} which is not minimal. Hence This is not
candidate key. Similarly for Others ABC+,ABD+ etc.
•Prime Attribute is A
•Non Prime Attribute are B,C,D,E
Finding Closure of Relation
•Solve These Problems:
• R(A,B,C,D,E) FD(A🡪B,D🡪E)
• R(A,B,C,D) FD(A->B,B🡪C,C🡪A)
• R(ABCDE) FD(A🡪B,BC🡪D,E🡪C,D🡪A)
Minimal Cover/Canonical Cover
• Whenever a user updates the database, the system must check whether any
of the functional dependencies are getting violated in this process. If there is a
violation of dependencies in the new database state, the system must roll
back. Working with a huge set of functional dependencies can cause
unnecessary added computational time.
• This is where the canonical cover comes into play.
A canonical cover of a set of functional dependencies F is a simplified set of
functional dependencies that has the same closure as the original set F.
• Extraneous attributes: An attribute of a functional dependency is said to be
extraneous if we can remove it without changing the closure of the set of
functional dependencies.
Minimal Cover/Canonical Cover
• A minimal cover of a set of functional dependencies (FD) E is a minimal set of dependencies
F that is equivalent to E.
• The formal definition is: A set of FD F to be minimal if it satisfies the following conditions −
• Every dependency in F has a single attribute for its right-hand side.
• We cannot replace any dependency X->A in F with a dependency Y->A, where Y is a proper subset of X, and still have a
set of dependencies that is equivalent to F.
• We cannot remove any dependency from F and still have a set of dependencies that are equivalent to F.
• Canonical cover is called minimal cover which is called the minimum set of FDs. A set of FD
FC is called canonical cover of F if each FD in FC is a −
• Simple FD.
• Left reduced FD.
• Non-redundant FD.
• Simple FD − X->Y is a simple FD if Y is a single attribute.
• Left reduced FD − X->Y is a left reduced FD if there are no extraneous attributes in X.
{extraneous attributes: Let XA->Y then, A is a extraneous attribute if X->Y}
• Non-redundant FD − X->Y is a Non-redundant FD if it cannot be derived from F- {X->y}.
Decomposition
• When a relation in the relational model is not in appropriate normal form then the
decomposition of a relation is required.
• In a database, it breaks the table into multiple tables.
• If the relation has no proper decomposition, then it may lead to problems like loss of
information.
• Decomposition is used to eliminate some of the problems of bad design like anomalies,
inconsistencies, and redundancy.
• Lossless Decomposition
• If the information is not lost from the relation that is decomposed, then the
decomposition will be lossless.
• The lossless decomposition guarantees that the join of relations will result in the same
relation as it was decomposed.
• The relation is said to be lossless decomposition if natural joins of all the decomposition
give the original relation.
Decomposition
• Dependency Preserving
• It is an important constraint of the database.
• In the dependency preservation, at least one decomposed table must satisfy
every dependency.
• If a relation R is decomposed into relation R1 and R2, then the dependencies of
R either must be a part of R1 or R2 or must be derivable from the combination
of functional dependencies of R1 and R2.
• For example,
• suppose there is a relation R (A, B, C, D) with functional dependency set (A->BC). The
relational R is decomposed into R1(ABC) and R2(AD) which is dependency preserving
because FD A->BC is a part of relation R1(ABC).
Decomposition
• Lossless decomposition-
• No information is lost from the original relation during decomposition.
• When the sub relations are joined back, the same relation is obtained that was
decomposed.
• Every decomposition must always be lossless.
• Consider there is a relation R which is decomposed into sub relations R1 , R2 , …. , Rn.
• This decomposition is called lossless join decomposition when the join of the sub
relations results in the same relation R that was decomposed.
• For lossless join decomposition, we always have-
• R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn = R where ⋈ is a natural join operator
Decomposition
• Lossless decomposition- Consider the following relation R( A , B , C )
A B C
1 2 1
2 5 3
3 3 3

• Consider this relation is decomposed into two sub relations R1( A , B ) and R2( B
, C )- A (R1) B B (R2) C
2 1 R1 ⋈ R2 = R We must get the same relation
1 2
2 5 5 3
3 3 3 3

•Lossless join decomposition is also known as non-additive join decomposition.



•This is because the resultant relation after joining the sub relations is same as the decomposed relation.
•No extraneous tuples appear after joining of the sub-relations.

Decomposition
• Lossy decomposition- Consider the following relation R( A , B , C )
A B C
1 2 1
2 5 3
3 3 3

A C B C
2 1 R1 ⋈ R2 ⊃ R
1 1
2 3 5 3
3 3 3 3
Decomposition
Now, if we perform the natural join ( ⋈ ) of the sub relations R1 and R2 we get-

A B C
1 2 1
2 5 3
2 3 3
3 5 3
3 3 3
Problem on Decomposition
Normalization
• Normalization is a technique of organizing the data in the database.
• Normalization is a systematic approach of decomposing tables to eliminate
data redundancy(repetition) and undesirable characteristics like Insertion,
Update and Deletion Anomalies.
• It is a multistep process that puts data into tabular form, removing
duplicated data from the relation tables.
• Normalization is used for mainly two purposes
• Eliminating redundant(useless) data.
• Ensuring data dependencies make sense i.e data is logically stored.
Normalization
•First Normal Form(1NF):
• If a relation contain composite or multi-valued attribute, it violates first normal form or a
relation is in first normal form if it does not contain any composite or multi-valued
attribute. A relation is in first normal form if every attribute in that relation is singled
valued attribute.
RollNo Name Course Course Contains Multivalued Hence it is not in
401 Dipak DBMS,CN 1NF.
402 Sandip TOC
To convert it into First Normal Form we must
403 Ramesh DS,SE
insert Data in the following Manner.
Or We can Create Another Column
RollNo Name Course Course2,Course3, If data is present then we
401 Dipak DBMS can put or we can put Null values.
401 Dipak CN We can Split the Table in 2 Tables(
402 Sandip TOC Rollno,Name) (RollNo,Course)
403 Ramesh DS
403 Ramesh SE
Normalization
•Second Normal Form(2NF):
• To be in second normal form, a relation must be in first normal form and
relation must not contain any partial dependency. A relation is in 2NF if it
has No Partial Dependency, i.e., no non-prime attribute (attributes which are
not part of any candidate key) is dependent on any proper subset of any
candidate key of the table.
• Partial Dependency – If the proper subset of candidate key determines
non-prime attribute, it is called partial dependency.
Normalization
•Second Normal Form(2NF): In the table Candidate Key is (Cust_Id,Store_Id)
Cust_ID Store_ID Location Which are also prime attributes. The non prime
1 1 Delhi attribute is location which is dependent on
1 3 Mumbai Store_ID not on the Candidate Key. Hence it
2 1 Delhi has partial dependency.
3 2 Pune To convert this into 2NF we must split the Table
4 3 Mumbai

Cust_ID Store_ID Store_ID Location


1 1 1 Delhi
3 Mumbai
Cust(Cust_id,Store_ID) and
1 3
2 Pune
Loc(Store_ID,Location) Now Location non
2 1
prime attribute depends on Store_ID.
3 2
4 3
Normalization The Table must be 2NF and there will be no
•Third Normal Form(3NF): transitive dependency in the Table then the
table is said to be in 3NF.
EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY Candidate Key is EMP_ID
222 Harry 201010 UP Noida Non Prime Att: Except
EMP_ID
333 Stephan 02228 US Boston
444 Lan 60007 US Chicago
555 Katharine 06389 UK Norwich
666 John 462007 MP Bhopal

Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID. The non-prime
attributes (EMP_STATE, EMP_CITY) transitively dependent on super key(EMP_ID). It violates the rule of third normal
form.
Normalization
•Third Normal Form(3NF):
EMP_ID EMP_NAME EMP_ZIP

222 Harry 201010


333 Stephan 02228
444 Lan 60007
555 Katharine 06389
666 John 462007

EMP_ZIP EMP_STATE EMP_CITY

201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
Normalization
• Boyce Codd normal form (BCNF)
• BCNF is the advance version of 3NF. It is stricter than 3NF.
• A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
• For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO

264 India Designing D394 283


264 India Testing D394 300
364 UK Stores D283 232
364 UK Developing D283 549

In the above table Functional dependencies are as


follows:
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate key: {EMP-ID, EMP-DEPT}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
Normalization
• To convert the given table into BCNF, we decompose it into three tables:
• A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
• For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO

264 India Designing D394 283

364 UK Testing D394 300

Stores D283 232


EMP_COUNTRY
Developing D283 549

EMP_DEPT
EMP_ID EMP_DEPT
EMP_DEPT_MAPPING
D394 283 Functional dependencies:
D394 300 1.EMP_ID   →    EMP_COUNTRY  
2.EMP_DEPT   →   {DEPT_TYPE, EMP_DEPT_NO}  
D283 232

D283 549 Candidate keys:


For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
Normalization
• 4NF: It must by in BCNF and There will be no multivalued Dependency.
• Multivalued Dependency: A table is said to be multi valued dependency, if
following conditions are true
• For a dependency A🡪B, if for a single value of A, multiple value of B exists then table
have multivalued dependency
• Table must have at least 3 columns
• R(A,B,C) , if there is multivalued dependency between A and B, then B and C should be
independent of each other.
• If all these conditions are true for any relation, it is said to have multi-valued
dependency.
• Eg. Name,Mobile, Email
• If students has Many Mobile nos and Many Email Address then Multi-valued dependency
exists.
• Split the Table R1(Name, Mobile) and R2( Name,Email)
Normalization
• 5NF: It must by in 4NF and there must be lossless decomposition.
Algorithms for Relational Database Schema Design

■ Algorithm 11.2: Relational Synthesis into 3NF with Dependency


Preservation (Relational Synthesis Algorithm)

Input: A universal relation R and a set of functional
dependencies F on the attributes of R.
1. Find a minimal cover G for F (use Algorithm 10.2);
2. For each left-hand-side X of a functional dependency that appears in G,
create a relation schema in D with attributes {X υ {A1} υ {A2} ...
υ {Ak}},
where X A1, X A2, ..., X Ak are the only dependencies in G with
X as left-hand-side (X is the key of this relation) ;
3. Place any remaining attributes (that have not been placed in any relation)
in a single relation schema to ensure the attribute preservation property.

Algorithms for Relational Database Schema Design (2)


Algorithm 11.3: Relational Decomposition into BCNF with
Lossless (non-additive) join property

Input: A universal relation R and a set of functional
dependencies F on the attributes of R.
1. Set D := {R};
2. While there is a relation schema Q in D that is not in BCNF
do {
choose a relation schema Q in D that is not in BCNF;
find a functional dependency X Y in Q that violates BCNF;
replace Q in D by two relation schemas (Q - Y) and (X υ Y);
};

Assumption: No null values are allowed for the join attributes.


Algorithms for Relational Database Schema Design (3)


Algorithm 11.4 Relational Synthesis into 3NF with Dependency
Preservation and Lossless (Non-Additive) Join Property

Input: A universal relation R and a set of functional
dependencies F on the attributes of R.
1. Find a minimal cover G for F (Use Algorithm 10.2).
2. For each left-hand-side X of a functional dependency that appears in
G,
create a relation schema in D with attributes {X υ {A1} υ {A2} ...
υ {Ak}},
where X A1, X A2, ..., X –>Ak are the only dependencies in
G with X as left-hand-side (X is the key of this relation).
3. If none of the relation schemas in D contains a key of R, then create
one more relation schema in D that contains attributes that form a key
of R. (Use Algorithm 11.4a to find the key of R)

You might also like