You are on page 1of 155

INTRODUCTION TO SCHEMA REFINEMENT

• It is a technique of organizing data in the database. It is systematic approach of


decomposing a tables to eliminate the data redundancy and anomalies like insertion,
deletion and update.
• Schema refinement is intended to address redundancy problem. It uses refinement
approach based on decompositions.
• Storing the Same information redundantly, that is, in more than one place within a
database, can lead to several problems:
• Problems caused by redundancy
Redundant Storage: Same information is stored repeatedly.
Update Anomalies: If one copy of such repeated data is updated, an inconsistency is
created unless all copies are similarly updated.
Insertion Anomalies: It may not be possible to store certain information unless some
other, unrelated, information is stored as well.
Deletion Anomalies: It may not be possible to delete certain information without
losing Some other, unrelated, information as well.
Consider a relation
• Hourly_Emps(ssn, name, lot, rating, hourly_wages, hours_worked)
• suppose that the hourly_wages attribute is determined by the rating
attribute. That is, for a given rating value, there is only one
permissible hourly_wages value.
• This IC is an example of a functional dependency. It leads to possible
redundancy in the relation Hourly_Emps, as illustrated in Figure.
• Redundant Storage: the rating value 8 corresponds to the
hourly wage 10, and this association is repeated three
times.
• update Anomalies: The hourly_wages in the first tuple could
be updated without making a similar change in the second
tuple.
• Insertion Anomalies: We cannot insert a tuple for an
employee unless we know the hourly wage for the
employee's rating value.
• Deletion Anomalies: If we delete all tuples with a given
rating value (e.g.,we delete the tuples for Srllethurst and
Guldu) we lose the association between that rating value
and its hourly_wage value.
Null Values
• null values cannot help eliminate redundant storage or update
anomalies. It appears that they can address insertion and
deletion anomalies.
• For instance, to deal with the insertion anomaly example, we
can insert an employee tuple with null values in the hourly
wage field.
• For example, we cannot record the hourly wage for a rating
unless there is an employee with that rating, because we
cannot store a null value in the ssn field, which is a primary
key field.
• Similarly, to deal with the deletion anomaly example, we
might consider storing a tuple with null values in all fields
except rating and hourly_wages .However, this solution does
not work because primary key fields cannot be null. Thus,
null values do not provide a general solution to the problems
of redundancy, even though they can help in some cases.
Decompositions
• Intuitively, redundancy arises when a relational schema forces an
association between attributes that is not natural.
• Functional dependencies can 'be used to identify such situations.
• A decomposition of a relation schema R consists of replacing the
relation schema by smaller relation schemas that each contain a
subset of the attributes of R and together include all attributes in
R.
• we can decompose Hourly_Emps into two relations:
Hourly_Emps(ssn,name, lot, rating, hours_worked)
wages (rating, hourly_wages)
• Note that we can easily record the hourly wage for any rating simply by adding a tuple to
wages, even if no employee with that rating appears in the current instance of hourly
_Emps.
• Changing the wage associated with a rating involves updating a single Wages tuple. This is
more efficient than updating several tuples (as in the original design), and it eliminates the
potential for inconsistency.
Types of Decomposition

• There are two types of decomposition :

1). Lossy Decomposition


2). Lossless Join Decomposition
1)Lossy Decomposition

• "The decomposition of relation R into R1 and R2 is lossy when


the join of R1 and R2 does not yield the same relation as in R.”

• One of the disadvantages of decomposition into two or more


relational schemes (or tables) is that some information is lost
during retrieval of original relation or causes inconsistency in
the original table.
EXAMPLE: Consider that we have table STUDENT with three attribute roll_no , sname
and department.
2)Lossless Join Decomposition

• "The decomposition of relation R into R1 and R2 is lossless


when the join of R1 and R2 yield the same relation as in R.“
• A relational table is decomposed (or factored) into two or more
smaller tables, in such a way that the designer can capture the
precise content of the original table by joining the decomposed
parts.
• This is called lossless-join (or non-additive join) decomposition.
• This is also referred as non-additive decomposition.
• The lossless-join decomposition is always defined with respect
to a specific set F of dependencies.
• The decomposition of R into ‘X’ & ‘Y’ is loss less join with respect to a
set of functional dependencies ‘F’ for every instance ‘r’ that satisfies
‘F’
1. R1 Ս R2 = R
2. R1 Ո R2!= 
means
R1 Ո R2=R1
OR
R1 Ո R2= R2
3. Common attribute while decomposing the tables should be a
candidate key or super key of either R1 or R2 or Both.
EXAMPLE: Consider that we have table STUDENT with three attribute roll_no ,
sname and department.
Problems Related to Decomposition
• Unless we are careful, decomposing a relation schema can create more
problems than it solves.
• Two important questions must be asked repeatedly:

• 1.Do we need to decompose a relation?



• If a relation schema is in one of these normal forms, we know that certain
kinds of problems cannot arise. Considering the normal form of a given
relation schema can help us to decide whether or not to decompose it
further.

• If we decide that a relation schema must be decomposed further, we must


choose a particular decomposition(i.e., a particular collection of smaller
relations to replace the given relation).
Cont……
• 2. What problems (if any) does a given decomposition cause?

• The lossless-join property enables us to recover any instance of the


decomposed relation from corresponding instances of the smaller relations.

• The dependency-preservation property enables us to enforce any


constraint on the original relation by simply enforcing same constraints on
each of the smaller relations.

• That is, we need not perform joins of the smaller relations to check
whether a constraint on the original relation is violated.
PROPERTIES OF DECOMPOSITION

Following are the properties of Decomposition,

1. Lossless Decomposition

2. Dependency Preservation

3. Lack of Data Redundancy


1. Lossless Decomposition
• Decomposition must be lossless.
• It means that the information should not get lost from the relation that is decomposed. OR
• Any inconsistency should not be formed when tables are joined again
• It gives a guarantee that the join will result in the same relation as it was decomposed.
Example:
Let's take 'E' is the Relational Schema, With instance 'e';
• is decomposed into: E1, E2, E3, . . . . En;
• With instance: e1, e2, e3, . . . . en, If e1 ⋈ e2 ⋈ e3 . . . . ⋈ en, then it is called as 'Lossless
Join Decomposition'.

------- In the above example, it means that, if natural joins of all the decomposition give the
original relation, then it is said to be lossless join decomposition.
2. Dependency Preservation
• Dependency is an important constraint on the database.
• Every dependency must be satisfied by at least one decomposed table.
• If {A → B} holds, then two sets are functional dependent. And, it becomes
more useful for checking the dependency easily if both sets in a same
relation.
• This decomposition property can only be done by maintaining the
functional dependency.
• In this property, it allows to check the updates without computing the
natural join of the database structure.
• Let R(A,B,C,D) is a relation holding the FD’s :
• {AB, BC, CD,DB }
• R is now decomposed into R1(AB), R2(BC), R3(BD)
• When a relation is decomposed, FD’s also gets divided between
decomposed tables and when done with the union of all FD’s they
should lead to the original set of FD’s.
• This means the dependencies are preserved
• Let’s check for this relation:
R1(AB) R2(BC) R3(BD) Let R(A,B,C,D) is a relation holding the FD’s :
{AB, BC, CD,DB }
AB BC BD R is now decomposed into R1(AB), R2(BC), R3(BD)

B A CB DB

So, the possible FD’s of decomposed relations are


{AB, BC, CB, BD, DB }
Now check this set with the original set of FD’s
{AB, BC, CD,DB }
and if they match then we can say that the dependencies are preserved
here.
Thus is a dependency Preservation .
3. Lack of Data Redundancy
• Lack of Data Redundancy is also known as a Repetition of
Information.
• The proper decomposition should not suffer from any data
redundancy.
• The careless decomposition may cause a problem with the
data.
• The lack of data redundancy property may be achieved by
Normalization process.
FUNCTIONAL DEPENDENCIES
• A functional dependency (FD) is a kind of IC that generalizes the
concept of a key.
• FD is a method which describes the relationship between the
attributes.
• Let R be a relation schema and let X and Y be non-empty sets of
attributes in R.
• We say that an instance r of R satisfies the FD X->Y if the following
holds for every pair of tuples t1 and t2 in r.
If t1.X = t2.X, then t1.Y = t2.Y.
• FD X ->Y essentially says that if two tuples agree on the values in
attributes X, they must also agree on the values in attributes Y.
Functional Dependencies (FD) - Definition
• Let R be a relation schema and X, Y be sets of attributes in R.
• A functional dependency from X to Y exists if and only if:
• For every instance of |R| of R, if two tuples in |R| agree on the values of the
attributes in X, then they agree on the values of the attributes in Y
• We write X  Y and say that X determines Y
• Example on Student (sid, name, supervisor_id, specialization):
• {supervisor_id}  {specialization} means
• If two student records have the same supervisor (e.g., Dimitris), then their specialization
(e.g., Databases) must be the same
• On the other hand, if the supervisors of 2 students are different, we do not care about
their specializations (they may be the same or different).
• FD:
• supervisor_id  specialization
Trivial FDs
• A functional dependency X  Y is trivial if Y is a subset of X and YX != 

• {name, supervisor_id}  {name}


• If two records have the same values on both the name and
supervisor_id attributes, then they obviously have the same name.
• Trivial dependencies hold valid for all relation instances

• A functional dependency X  Y is non-trivial if YX = 


• {supervisor_id}  {specialization}
• Non-trivial FDs are given implicitly in the form of constraints when
designing a database.
• For instance, the specialization of a students must be the same as that
of the supervisor.
• They constrain the set of legal relation instances. For instance, if I try
to insert two students under the same supervisor with different
specializations, the insertion will be rejected by the DBMS
Functional Dependencies and Keys
• A FD is a generalization of the notion of a key.

• For Student (sid, name, supervisor_id, specialization),


we write:
• {sid}  {name, supervisor_id, specialization}
• The sid determines all attributes (i.e., the entire record)
• If two tuples in the relation student have the same sid, then they must have the same values
on all attributes.
• In other words they must be the same tuple (since the relational model does not allow
duplicate records)
Properties of Functional Dependency:
Author No Title Publisher ISBN Pid First name
Korth 1 DBMS PHB 20 P1 Pooja
Siberth 2 DBMS PHB 20 P1 Pooja
ABC 3 DBMS PHB 20 P1 Pooja

ISBN  Title
ISBN,no  author
Pid  publisher
Armstrong rules/Axioms
1. Reflexivity: if Y is a subset of X, then X Y.
EX: ISBN, no  ISBN
Cust_name, loan no cust_name
2. Augmentation: if X  Y then XZ  YZ for any Z
EX: ISBN  title
ISBN, no  title, no
3. Transitivity: it X  Y and Y  Z then X  Z.
EX: ISBN  publisher
publisher  pid
ISBN  pid
4. Self – determination: X  X.
EX: author author
5. Decomposition: If X  YZ, then XY, XZ , the axiom of
decomposition indicates that determinant of any FD , it can uniquely
determine individual attribute.
EX: Pincode  state, city
Pincode state
Pincodecity
6. Union: If X  Y and X  Z, then X  YZ, , if they two FD’s are with
same determinant then it is possible to form a new FD.
EX: ISBN publisher
ISBN  pid
ISBN  publisher, pid
Superkeys and Candidate Keys
• A set of attributes that determine the entire tuple is a superkey
• {sid, name} is a superkey for the student table.
• Also {sid, name, supervisor_id} etc.
• A minimal set of attributes that determines the entire tuple is a candidate key
• {sid, name} is not a candidate key because I can remove the name.
• sid is a candidate key
• If there are multiple candidate keys, the DB designer designates one as the primary key.
Reasoning about Functional Dependencies
It is sometimes possible to infer new functional dependencies from a set of given functional dependencies
• independently from any particular instance of the relation schema or of any additional knowledge

Example:
From
{sid}  {first_name} and
{sid} {last_name}

We can infer
{sid}  {first_name, last_name}
Reasoning About FDs
• The reasons about FDs are in two ways:
• 1. CLOSURE OF A SET OF FDs
• 2. ATTRIBUTE CLOSURE

• 1. CLOSURE OF A SET OF FDs (FDC):

• The set of all FDs implied by a given set F of FDs is called the closure of F
and is denoted as F+.

• An important question is how we can infer, or compute, the closure of a


given set F of FDs.

• The following three rules, called Armstrong's Axioms, can be applied


repeatedly to infer all FDs implied by a set F of FDs.
CLOSURE OF A SET OF FDs
• Given some FDs, we can usually infer or compute
additional FDs:
• ssn did, did lot implies ssn lot
• An FD f is implied by a set of FDs F if f holds whenever all
FDs in F hold.
• F + = closure of F is the set of all FDs that are implied by F.
• Armstrong’s Axioms (OR) AA (X, Y, Z are sets of
attributes):
• Reflexivity: If Y C X, then X Y
• Augmentation: If X Y, then XZ YZ for any Z
• Transitivity: If X Y and Y Z, then X Z
• These are sound and complete inference rules for FDs!
• Why armstrong axioms refer to the Sound and Complete?
By sound, we mean that given a set of functional dependencies F
specified on a relation schema R, any dependency that we can infer
from F by using the primary rules of Armstrong axioms holds in every
relation state r of R that satisfies the dependencies in F.

By complete, we mean that using primary rules of amrstrong axioms
repeatedly to infer dependencies until no more dependencies can be
inferred results in the complete set of all possible dependencies that
can be inferred from F.
Cont…..
• Couple of additional rules (that follow from AA):
• Union: If X Y and X Z, then X YZ
• Decomposition: If X YZ, then X Y and X Z

• Example:
• Contracts(cid,sid,jid,did,pid,qty,value), and:
• C is the key: C CSJDPQV
• Project purchases each part using single contract: JP C
• Dept purchases at most one part from a supplier: SD P
• JP C, C CSJDPQV imply JP CSJDPQV
• SD P implies SDJ JP
• SDJ JP, JP CSJDPQV imply SDJ CSJDPQV
2. ATTRIBUTE CLOSURE (AC)

• Closure of a Set of Attributes X (Attribute closure), from


functional dependencies F, is the set of attributes which are
functionally dependent from the set of Attributes X and it is
denoted with X+.

• Properties of a Key:
• A key X, which belongs in relation R has the following
properties:
• 1. X ⊆ R
• 2. X → R (complete key)
• 3. There is no X′ ⊂ X such that X′ →R (minimal key)
Example for producing a FD based on AA
• Given a relation R with attributes W, U, V, X, Y, Z and functional
dependencies: W  UV, U Y, VX  YZ. Prove that it holds: WXZ.

• Solution:
• 1. (with decomposition) From W  UV, W U we take W V.
• 2. (with augmentation) WX VX.
• 3. (with transitivity) Using WXVX and VXYZ we get WX YZ.
• 4. (with decomposition) WX Y and WX Z.
Algorithm to find the Closure of a Set of attributes X

• Given a relation R και its functional dependencies


• F+, find the closure of attribute A.

• 1. Let X=A.
• 2. Among the functional Dependencies of F+, we
• search for dependencies C D, where C ⊆ X. If
• we found such a dependency, then we add D in X.
• 3. We repeat Step 2 till we cannot add additional
• attributes in X.
Example 1
• Let R= (V, Y, Z, W) and FD = {V Z, VZ W, W Y, VY W}
• Find the closure of attribute V.
• V+ = {VZWY}
• Solution:

• Step 1: X=V.
• Step 2: X=VZ because of V Z.
• Step 3: X=VZW because of VZ W.
• Step 3: X=VZWY because of W Y.
• Step 3: No more repeats can be made.
Example 2
• Let R = ( V, Y, Z, W) and F+ = {V Υ, W Y, V W}
• Find the closure of attribute V.
• V+ = {V,Y,W}
• Solution:

• Step 1: X=V.
• Step 2: X=VY because of V Y.
• Step 3: X=VYW because of V W.
• Step 3: no more repeats can be made.
Example 3
• Finding all the candidate keys in a relation R(ABCD)
• FD = {AB, BC, CD}
• Candidate key is a key that determines every other attribute in the table
• A+ = {ABCD} (i.e., A is a candidate key)
• B+ = {BCD} (i.e., B is not a candidate key)
• C+ = {CD} (i.e., C is not a candidate key)
• D+ = {D} (i.e., D is not a candidate key)
• CK={A}
• Prime Attribute : Is an attribute that helps in formation of Candidate key.
• So the Prime Attribute for this relation are : { A }
• Non-Prime Attributes are :{B,C,D}
• AB+= {ABCD}
• AB here can be a CK, but it cannot be a CK because CK is always minimal
• A is a CK already, if added anything to it , it becomes the super key but not the CK.
Example 4
• Let R be a relation having R(ABCD)
• FD = {AB, BC, CD, DA}
• A+ = {ABCD}
• B+ = {BCDA}
• C+ = {CDAB}
• D+ = {DABC}
• So the CK ={A,B,C,D}
• Prime Attributes are : {A,B,C,D}
• Non Prime Attributes are : {}
Example 5
Consider the relation scheme R(A,B,C,D) with functional dependencies
{A}{C} and {B}{D}.
 {A}+ = {A,C}
 {B}+ = {B,D}
 {C}+={C}
 {D}+={D}
 {A,B}+ = {A,B,C,D}
Example 6
• Consider the relation scheme R(A,B,C,D,E) with functional dependencies {AB , BCD, EC,
DA}
• RHS determining attributes = {BDCA}
• First Check right side attributes & the one which is not in the RHS , it has to be at left hand side.
• E is definitely used in preparing CK
• E+ = EC
• With every attribute E should be attached
• AE+ = {ABECD}
• BE+ = {BECDA}
• CE+ = {CE}
• DE+= { DEABC}
• Candidate keys : {AE, BE, DE}
• Prime Attributes : {A,B,D,E}
• Non Prime Attributes : {c}
Normal Forms
Normalization of Relations
Normalization: The process of decomposing unsatisfactory "bad" relations
by breaking up their attributes into smaller relations
•Normalization is the process of organizing the data in the database.
•Normalization is used to minimize the redundancy from a relation or set of
relations. It is also used to eliminate the undesirable characteristics like Insertion,
Update and Deletion Anomalies.
•Normalization divides the larger table into the smaller table and links them using
relationship.
•The normal form is used to reduce redundancy from the database table.

Normal form: Condition using keys and FDs of a relation to certify whether
a relation schema is in a particular normal form

Relationship Between
Normal Forms
Types of Normal Form
• First Normal Form (1NF) - included in the definition of a relation.

• Second Normal Form (2NF) defined in terms of


• functional dependencies
• Third Normal Form(3NF)

• Boyce-Codd Normal Form ( BCNF)

• Fourth Normal Form (4NF) - defined using multivalued


• dependencies

• Fifth Normal Form (5NF) or Project Join Normal Form (PJNF) -


• defined using join dependencies
Normal Forms:

First Normal Form


‘A relation R is in first normal form (1NF) if and only if all underlying
domains contain atomic values only.’

Second Normal Form


‘A relation R is in second normal form (2NF) if and only if it is in 1NF and
every non-key attribute is fully dependent on the primary key.’

Third Normal Form


‘A relation R is in third normal form (3NF) if and only if it is in 2NF and
every non-key attribute is non transitively dependent on the primary key.’

Boyce-Codd Normal Form


‘A relation R is in Boyce-Codd normal form (BCNF) if and only if every
determinant is a candidate key.’
First Normal Form(1NF):

‘A relation R is in first normal form (1NF) if and only if all underlying


domains contain atomic values only.’

Disallows composite attributes, multivalued attributes, and nested


relations; attributes whose values for an individual tuple are non-atomic
First Normal Form: Example Multivalued attributes

Department
DNAME DNUMBER DMGRSSN DLOCATIONS

DNAME DNUMBER DMGRSSN DLOCATIONS


Research 5 333445555 Bellaire, Sugarland, Houston
Administration 4 987654321 Stanford
Headquarters 1 888665555 Houston
First Normal Form: Solutions to example
There are three main techniques to achieve first normal form for such a
relation:
Department Department_Loc
1. DNAME DNUMBER DMGRSSN DNUMBER DLOCATIONS

DNAME DNUMBER DMGRSSN DLOCATIONS


2.
Research 5 333445555 Bellaire
Research 5 333445555 Sugarland
Research 5 333445555 Houston
Administration 4 987654321 Stanfford
Headquarters 1 888665555 Houston

3.

DNAME DNUMBER DMGRSSN DLOCATIONS1 DLOCATIONS2 DLOCATIONS3


First Normal Form: Solutions

There are three main techniques to achieve first normal form for such a
relation:

1. Remove the attribute DLOCATIONS that violates 1NF and place it in a


separate relation DEPT_LOCATIONS along with the primary key
DNUMBER of DEPARTMENT.

2. Expand the key so that there will be a separate tuple in the original
DEPARTMENT relation for each location of a DEPARTMENT. In this
case, the primary key becomes the combination {DNUMBER,
DLOCATION}.

3. If a maximum number of values is known for the attribute-

for example, if
it is known that at most three locations can exist for a department-
replace the DLOCATIONS attribute by three atomic attributes:
DLOCATIONl, DLOCATION2, and DLOCATION3.
First Normal Form: Best solution

Of the three solutions above, the first is generally considered best


because it does not suffer from redundancy and it is completely general,
having no limit placed on a maximum number of values.
First Normal Form: Nested relations
First normal form also disallows multivalued attributes that are
themselves composite.
These are called nested relations because each tuple can have a
relation within it.
PROJS
EMP_PROJ ENO ENAME PNUMBER HOURS

ENO ENAME PNUMBER HOURS


11111 Mahesh 1 32.5
2 7.5
22222 Ramesh 2 40.5
3 35.6
33333 Kiran 2 25.8
10 45
20 38.7
44444 Karthik 30 9.7
First Normal Form: Solution

To normalize this into INF,


we remove the nested relation attributes into a new relation and
propagate the primary key into it;
the primary key of the new relation will combine the partial key with the
primary key of the original relation.

EMP_PROJ1
ENO ENAME

EMP_PROJ2
ENO PNUMBER HOURS
• First Normal Form : Problem

• SID is the primary key


• Here course attribute has multiple values

SID NAME COURSE


1 SAI C,C++
2 HARSH JAVA
3 OMKAR C,DBMS
• First Solution:

• Primary key : (ROLL NO,COURSE)

ROLL NO NAME COURSE


1 SAI C
1 SAI C++
2 HARSH JAVA
3 OMKAR C
3 OMKAR DBMS
• Second Solution:

• Primary key : Roll No

ROLL NO NAME COURSE 1 COURSE 2


1 SAI C C++
2 HARSH JAVA NULL
3 OMKAR C DBMS
• Third Solution : Best one

Primary Key : ROLL NO , COURSE


FOREIGN KEY : ROLL NO
Primary key : ROLL NO

ROLL NO COURSE
ROLL NO NAME
1 C
1 SAI
1 C
2 HARSH
2 JAVA

3 OMKAR 3 C

3 DBMS
First Normal Form: Problem
Person
ENO CAR_LIC PHONE

Where CAR_LIC and PHONE are multivalued attributes

Is this in 1NF, if no decompose.


Example
Example
• A row of data cannot contain repeating group of data i.e atomic value
• Cid Cname Subject
• 101 Jeet PHY
• 101 Jeet CHE
• 102 Seeta PHY
103 Sweta SOCIAL

• Here the student Jeet is used twice in the table and subject PHY is repeated Another
method is
• to divide the relation into 2
Take the following table.

StudentID is the primary key.

Is it 1NF?
No. There are repeating groups (subject,
subjectcost, grade)

How can you make it 1NF?


Create new rows so each cell contains
only one value

But now look – is the studentID primary


key still valid?
No – the studentID no longer uniquely
identifies each row

You now need to declare studentID and subject


together to uniquely identify each row.

So the new key is StudentID and Subject.


So. We now have 1NF.

Is it 2NF?
Before learning 2NF, it is important to know
what is partial dependency
Partial Dependency
• Partial Dependency – when an non-key attribute is
determined by a part, but not the whole, of a
COMPOSITE primary key.
Partial
CUSTOMER Dependency

Cust_ID Name Order_ID For suppose we have FD’s


101 AT&T 1234 CUST_ID, ORDER_ID  NAME
(This is fully dependent on the
101 AT&T 156 key)
CUST_ID  NAME ( This is
125 Cisco 1250 partially dependent on the key )
Second Normal Form(2NF):

A relation R is in second normal form (2NF) if and only if it is in 1NF and


every non-key attribute is fully functionally dependent on the primary key
(or candidate key).

There should be no partial dependency

A functional dependency X  Y is a full functional dependency


if removal of any attribute A from X means that the dependency does not
hold any more.
Studentname and address are dependent on studentID (which is part of the
key)
This is good.
Key : (StudentID, Subject)

But they are not dependent on


Subject (the other part of the
key)
And 2NF requires…

All non-key attributes should


be dependent on the
ENTIRE key (studentID +
subject)
So it’s not 2NF

How can we fix it?


Make new tables
• Make a new table for each primary key field
• Give each new table its own primary key
• Move columns from the original table to the new table that matches
their primary key…
Step 1

STUDENT TABLE (key = StudentID)


Step 2

STUDENT TABLE (key = StudentID)

SUBJECTS TABLE (key = Subject)


Step 3

STUDENT TABLE (key = StudentID)

SUBJECTS TABLE (key = Subject)

RESULTS TABLE (key = StudentID+Subject)


Step 3
COLLEGE_DB TABLE

STUDENT TABLE (key = StudentID)

SUBJECTS TABLE (key = Subject)

RESULTS TABLE (key = StudentID+Subject)


Step 4 - relationships
STUDENT TABLE (key = StudentID)

SUBJECTS TABLE (key = Subject)

RESULTS TABLE (key = StudentID+Subject)


Step 4 - cardinality
STUDENT TABLE (key = StudentID)

1 Each student can only appear


ONCE in the student table SUBJECTS TABLE (key = Subject)

RESULTS TABLE (key = StudentID+Subject)


Step 4 - cardinality
STUDENT TABLE (key = StudentID)

1
SUBJECTS TABLE (key = Subject)

Each subject can only appear


ONCE in the subjects table

RESULTS TABLE (key = StudentID+Subject)


Step 4 - cardinality
STUDENT TABLE (key = StudentID)

1
SUBJECTS TABLE (key = Subject)

1
A subject can be listed MANY
times in the results table (for
different students)
8

RESULTS TABLE (key = StudentID+Subject)


Step 4 - cardinality
STUDENT TABLE (key = StudentID)

1
SUBJECTS TABLE (key = Subject)

1
A student can be listed MANY
times in the results table (for
different subjects)
8

RESULTS TABLE (key = StudentID+Subject)


A 2NF check
STUDENT TABLE (key = StudentID)

1
SUBJECTS TABLE (key = Subject)

SubjectCost is only
dependent on the
8

primary key,
Subject
RESULTS TABLE (key = StudentID+Subject)
A 2NF check
STUDENT TABLE (key = StudentID)

1
SUBJECTS TABLE (key = Subject)

1
8

Grade is Fully dependent


on the primary key
(studentID + subject)
RESULTS TABLE (key = StudentID+Subject)
A 2NF check
STUDENT TABLE (key = StudentID)

1 Name, Address are only


dependent on the SUBJECTS TABLE (key = Subject)
primary key 1
(StudentID)
8

RESULTS TABLE (key = StudentID+Subject)


STUDENT TABLE (key = StudentID)

So it is SUBJECTS TABLE (key = Subject)

2NF!
1
8

But is it 3NF?
RESULTS TABLE (key = StudentID+Subject)
Second Normal Form(2NF): Examples

EMP_PROJ
ENO PNUMBER HOURS ENAME PNAME PLOCATION
FD1

FD2

FD3

FD1 : ENO,PNUMBER  HOURS


FD2 : ENO  ENAME
FD3 : ENO,ENAME PNAME, PLOCATION
Second Normal Form(2NF): Testing

The test for 2NF involves testing for functional dependencies whose left-
hand side attributes are part of the primary key.

If the primary key contains a single attribute, the test need not be applied.

If a relation schema is not in 2NF, it can be "second normalized" into a


number of 2NF relations in which nonprime attributes are associated only
with the part of the primary key on which they are fully functionally
dependent.
Second Normal Form(2NF): Solution to example

In functional dependency FD1, the attribute HOURS is fully functionally


dependent on the primary key {ENO, PNUMBER}.

The functional dependencies FD2 and FD3 make ENAME, PNAME, and
PLOCATION partially dependent on the primary key {ENO, PNUMBER,
PLOCATION}.
Hence, decompose EMP_PROJ into the three relation schemas
EMP_PROJ1, EMP_PROJ 2, and EMP_PROJ 3

EMP_PROJ1 EMP_PROJ2
ENO PNUMBER HOURS ENO ENAME
FD1 FD2

EMP_PROJ3
PNUMBER PNAME HOURS PLOCATION
FD3
Example
• Consider the following relation, not in 2NF

-HerHere Cid & Order_id is PK


-It is in 1NF
-Not in 2NF, there are partial dependencies of columns on Primary key
-Cname is only dependent on Cid
-Order_name is dependent on order_id
-There is no link between Cname & Sale_details
-To reduce this table into 2NF, break the table into 3 different tables
e Cid & Order_id is PK
-It is in 1NF
-Not in 2NF, there are partial dependencies of columns on Primary key
-Cname is only dependent on Cid
-Order_name is dependent on order_id
-There is no link between Cname & Sale_details
-To reduce this table into 2NF, break the table into 3 different tables
Cont…… Here Cid & Order_id is PK
-It is in 1NF
-Not in 2NF, there are partial dependencies of
columns on Primary key
-Cname is only dependent on Cid
-Order_name is dependent on order_id
-There is no link between Cname &
Sale_details
-To reduce this table into 2NF, break the table
into 3 different tables
Example

FD : StoreID  Location
Key : CustomerID,StoreID
Here , the key is custID,StoreID.
CustomerID Store ID Location Location is dependent on store ID
1 1 Delhi
but not on customerID. So, there
1 3 Mumbai
2 1 Delhi exists a partial dependency here
3 2 Banglore which violates the rule of 2NF .
4 3 mumbai
So, this has to be decomposed now.
Key : CustomerID,StoreID
Key : StoreID CustomerID StoreID
Solutions:
StoreID Location 1 1
1 Delhi 1 3
2 Banglore 2 1
3 Mumbai 3 2
4 3
Here the LOCATION is completely dependent on the key
Example:
• For example AB is a candidate key of any relation R
• Now, Either A or B are the proper subsets of AB and AB is the subset
of AB
• If a part of the candidate key , like, either A or B is determining C
which is a non prime attribute, then we say partial dependency exists.
• This is not supposed to be in 2NF
Example : Find out whether this relation is in 2NF
Let R(A,B,C,D,E,F)
FD { CF, EA,ECD,AB}
Now let’s find out the candidate key in a relation here. For that there is procedure we have already
discussed earlier by using closure method.
First of all, R.H.S elements should be checked. F,A,D,B.
The elements which are not there at R.H.S are E,C.
Now we can say in L.H.S while finding the closures there should be E,C.
SO, whatever be the candidate key, EC is definitely present in that candidate key.
Then why not start with the EC CLOSURE itself .
EC+ = {ECFADB}
Hence all the attributes of a table are determined by EC .
So, we can say EC is the candidate key for this table.
CK = {EC}
Prime Attributes : {E,C}
Non Prime Attributes : {A,B,D,F}
Now, check any non-prime attribute is being determined by a part of candidate key.
YES!! CF, EA are causing partial dependency.
Before learning 3NF you have to learn about
transitive dependency
Transitive Dependency
• Transitive Dependency – A functional dependency is said to be transitive if it
is indirectly formed by two functional dependencies.
• The advantage of removing transitive dependency is -
• Amount of data duplication is reduced.
• Data integrity achieved.
Transitive
Dependency

EMPLOYEE

Emp_ID F_Name L_Name Dept_ID Dept_Name


111 Mary Jones 1 Acct
122 Sarah Smith 2 Mktg

Emp_id  F_Name, L_Name, Dept_ID


Dept_ID  Dept_name
Third Normal Form(3NF):

A relation R is in third normal form (3NF) if and only if it is in 2NF and


every non-key attribute is non transitively dependent on the primary key.

A relation schema R is in third normal form (3NF) if, whenever a


nontrivial functional dependency X  A holds in R, either (a) X is a super
key of R, or (b) A is a prime attribute of R.

A relation is said to be in 3NF if and only if,


It is in 2NF and
There should be no transitive dependency
Example:
KEY : {ROLL NO}
FD : { Roll no  State , State  City }
Prime Attributes : {Roll no }
Non Prime Attributes : {State, City}
ROLL NO STATE CITY

1 Punjab Mohali

So, There exists a 2 Haryana Ambala

Transitive dependency here 3 Punjab Mohali

Let’s decompose.. 4 Haryana Ambala

5 Bihar Patna
• Solution :
RC Table
RS Table

ROLL NO STATE STATE CITY

1 Punjab Punjab Mohali

2 Haryana Haryana Ambala

3 Punjab Punjab Mohali

4 Haryana Haryana Ambala

5 Bihar Bihar Patna

• Exercise : TOURNAMENT YEAR WINNER WINNER_DOB


Is it in 3NF ? TOURNAMENT YEAR,WINNER
If not, then decompose WINNERWINNER_DOB
A 3NF check
STUDENT TABLE (key = StudentID)

Oh oh… SUBJECTS TABLE (key = Subject)

What? Is it in
3NF?
8

RESULTS TABLE (key = StudentID+Subject)


A 3NF check
STUDENT TABLE (key = StudentID)

1
HouseName is
SUBJECTS TABLE (key = Subject)
dependent on both
1
StudentID +
HouseColour
8

RESULTS TABLE (key = StudentID+Subject)


A 3NF check
STUDENT TABLE (key = StudentID)

1
OR HouseColour is
SUBJECTS TABLE (key = Subject)
dependent on both
1
StudentID +
HouseName
8

RESULTS TABLE (key = StudentID+Subject)


A 3NF check
STUDENT TABLE (key = StudentID)

1
But either way,
SUBJECTS TABLE (key = Subject)
non-key fields are
dependent on MORE 1

THAN THE PRIMARY


KEY (studentID)
8

RESULTS TABLE (key = StudentID+Subject)


A 3NF check
STUDENT TABLE (key = StudentID)

1
And 3NF says that
SUBJECTS TABLE (key = Subject)
non-key fields must
depend on nothing 1

but the key


8

RESULTS TABLE (key = StudentID+Subject)


A 3NF check StudentID  StudentName,Address, HouseName
HouseName HouseColor

STUDENT TABLE (key = StudentID)

1
WHAT DO SUBJECTS TABLE (key = Subject)

WE DO? 1
8

RESULTS TABLE (key = StudentID+Subject)


Again, carve off the offending fields

SUBJECTS TABLE (key = Subject)


8

RESULTS TABLE (key = StudentID+Subject)


A 3NF fix

SUBJECTS TABLE (key = Subject)


8

RESULTS TABLE (key = StudentID+Subject)


A 3NF fix

8
1
1

SUBJECTS TABLE (key = Subject)


8

RESULTS TABLE (key = StudentID+Subject)


A 3NF win!

8
1
1
8

8
1
SUBJECTS TABLE (key = Subject)
RESULTS TABLE (key = StudentID+Subject)

StudentTable 1 GradesTable
¥
StudentID* StudentID* SubjectTable
¥ 1
StudentName Subject* Subject*
Address Grade SubjectCost
HouseName ¥

Or… * primary key

1 HouseTable
HouseName*
HouseColour
The Reveal
Before…

After… 1

8
1
1
8

8
SUBJECTS TABLE (key = Subject)

RESULTS TABLE (key = StudentID+Subject)


Third Normal Form(3NF): Example

EMP_PROJ

ENO ENAME DOB ADDRESS DNUMBER DNAME DMGRENO

Here the attributes DNAME and DMGRENO are transitively dependent


on ENO.
Third Normal Form(3NF): Solution to example

Decompose the relation into two relations.

EMP_PROJ1

ENO ENAME DOB ADDRESS DNUMBER

EMP_PROJ2

DNUMBER DNAME DMGRENO


Example
-With exam_name and total_marks added to our Score table, it saves more data now. Primary key for our
Score table is a composite key, which means it's made up of two attributes or columns → student_id +
subject_id.

-Our new column exam_name depends on both student and subject. For example, a mechanical
engineering student will have Workshop exam but a computer science student won't. And for some
subjects you have Prctical exams and for some you don't. So we can say that exam_name is dependent on
both student_id and subject_id.

-And what about our second new column total_marks? Does it depend on our Score table's primary key?

-Well, the column total_marks depends on exam_name as with exam type the total score changes. For
example, practicals are of less marks while theory exams are of more marks.

-But, exam_name is just another column in the score table. It is not a primary key or even a part of the
primary key, and total_marks depends on it.

-This is Transitive Dependency. When a non-prime attribute depends on other non-prime attributes
rather than depending upon the prime attributes or primary key.
Third Normal Form(3NF): Example
Consider a relation R with simple attributes {A, B, C, D, E}.
R contain two candidate keys {A} and {B,C}
{A} is made primary key.
The following FDs hold over R
A  BCDE
A B C D E

D  E

Is R in 3 NF, if no decompose.
Third Normal Form(3NF): Example to check whether a relation is in 3NF ?

• Consider a relation R with simple attributes {A, B, C, D}.


• The following FDs hold over R
AB  C
C  D
AB+ = {ABCD}
CK : {AB}
Prime Attributes : A, B
Non Prime Attributes : C, D
Here, AB which are prime attributes is determining a non prime attribute which is
perfectly fine!
Whereas C being a non prime attribute determining another non prime attribute causes
violation of 3NF
If C here would be a prime attribute then there would be no problem and either if D is a
prime attribute then it causes no problem, but here a non prime attribute is determining
another non prime attribute which causes 3NF violation . This relation is NOT in 3 NF
Boyce-Codd Normal Form(BCNF) / 3.5 NF:

A relation R is in Boyce-Codd normal form (BCNF) if it in 3NF and only if


every determinant is a candidate key.

A relation schema R is in BCNF if whenever a nontrivial functional


dependency X  A holds in R, then X is a super key of R.

The only difference between the definitions of BCNF and 3NF is that
condition (b) of 3NF, which allows A to be prime, is absent from BCNF.
3NF Table Not in BCNF

AB  C
AB D
CB

Figure 4.7
Decomposition of Table
Structure to Meet BCNF
BCNF Conversion Results
Example
ROLL NO NAME VOTER_ID AGE

1 RAVI K0123 20 CK: {ROLLNO, VOTER_ID}


PK : {ROLL NO}
2 VARUN M034 21
FD’s:
3 RAVI K786 23 ROLLNO  NAME
ROLLNO  VOTER_ID
4 RAHUL D286 21
VOTER_ID  AGE
VOTER_ID  ROLLNO
5 VARUN H876 20

ROLLNO VOTER_ID VOTER_ID AGE


ROLLNO NAME
1 K0123 K0123 20
1 RAVI
2 M034 M034 21
2 VARUN
K786 23
3 RAVI 3 K786
D286 21
4 RAHUL
4 D286
H876 20
5 VARUN
5 H876
Boyce-Codd Normal Form(BCNF): Example

Consider a relation TEACH with the following dependencies:


FD1: {STUDENT, COURSE}  INSTRUCTOR
FD2: INSTRUCTOR  COURSE

Note that {STUDENT, COURSE} is a candidate key for this relation


Boyce and Codd Normal Form (BCNF)
• Boyce and Codd Normal Form is a higher version of the Third Normal form. This form deals with
certain type of anomaly that is not handled by 3NF.

• A 3NF table which does not have multiple overlapping candidate keys is said to be in BCNF.

• For a table to be in BCNF, following conditions must be satisfied:


• - R must be in 3rd Normal Form.
• - and, for each functional dependency ( X → Y ), X should be a super Key.
Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
Now, this is in BCNF because left side part of
both the functional dependencies is a key.
Example

In the table above, student_id, subject form primary key


FD’s: We have Insertion anomaly over here,
Student_id, subject  professor Because without any student I cannot add
Professor  subject DBMS faculty as SID can’t be null
We also have deletion anomaly here ,
For suppose, If I had to delt 103 student, the
details of P.chash also gets deleted .
Cont……

• Why the table is not in BCNF?


• In the table above, student_id, subject form primary key, which
• means subject column is a prime attribute.
• But, there is one more dependency, professor → subject.
• And while subject is a prime attribute, professor is a non-prime attribute,
which is not allowed by BCNF.

• How to satisfy BCNF?


• To make this relation(table) satisfy BCNF, we will decompose this table into
two tables, student table and professor table.
Cont……
Student subject Subject professor

101 Java Java p.Java

101 C++ C# p.Chash

102 Java C++ p.Cpp

103 C#

104 java

• And now, this relation satisfy Boyce-Codd Normal Form.


Problems with Decompositions:

Three problems to consider:

 Some queries become more expensive.

 Given instances of the decomposed relations, we may not be able


to reconstruct the corresponding instance of the original relation!

 Checking some dependencies may require joining the instances of


the decomposed relations.
Before learning about 4NF you have to learn
about Multi Valued Dependency
Multivalued Dependency
• Multivalued Attributes (or repeating groups): There are 3
attributes A,B,C for each value of A there is a well defined
set of values for B and for each value of A there is well
defined set of values for C but B & C are independent.

STUDENT

Stud_ID Name Course_ID Units


101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00
Multivalued Dependencies:
Multivalued Dependencies: Definition

Let R be a relation schema and let X and Y be subsets of the attributes of R.

The multivalued dependency X  Y is said to hold over R if, in every legal


instance r of R, each X value is associated with a set of Y values and this set is
independent of the values in the other attributes.

Formally, if the MVD X  Y holds over R and Z = R−XY , the following must be
true for every legal instance r of R:

If t1 є r, t2 є r and t1:X = t2:X, t3:X = t4.X


then there must be some t3 є r & t4 є r such that

t1:XY = t3:XY and t2:Z = t3:Z.


t2.XY = t4.XY and t1.Z= t4.Z
Multivalued Dependencies: Armstrong Axioms

MVD Complementation: If X  Y, then X  R − XY .

MVD Augmentation: If X  Y and W Z, then WX  YZ.

MVD Transitivity: If X  Y and Y  Z, then X  (Z − Y )

Replication: If X  Y, then X  Y.

Coalescence: If X  Y and there is a W such that W ∩ Y is empty,


W  Z, and Y Z, then X  Z.
• So, three conditions of Multi Valued Dependency are :

• If all these conditions hold true, then we can say that the table
contains multi valued dependency
• Also, multi valued dependency can exists for more than 1 column
Example

• A Student here has opted for two subjects and has 2 hobbies.

• What Problem this could lead to ?


• Student with s_id = 1, will give
rise to 2 more additional rows
addressing the hobbies
Fourth Normal Form(4NF):

Let R be a relation schema, X and Y be nonempty subsets of the


attributes of R, and F be a set of dependencies that includes both
FDs and MVDs.

R is said to be in fourth normal form (4NF) if and only if it is in 3NF and


for every MVD X  Y that holds over R, one of the following
statements is true:

a) Y X or XY = R, or

b) X is a superkey.

A relation is said to be in 4NF if and only if it is in 3NF and it has NO


Multi valued dependency.
It is always advised to
keep such attributes
in independent
tables.
So, If we decompose
this,
• Sometimes, a table can have both Functional dependency and Multi-
Valued Dependency.
• Let’s add Address column to our original table.

• So, Now this satisfies the 4th NF


Multivalued Dependencies:

C  T
C B
Does Multi Valued Dependency exists on the above relation?
YES!!
• Each course has several teachers and each course has several
textbooks. The text book is used for a given course is
independent of the teacher.
• If you want to add a new textbook to Physics101 course we
need to add two new rows to (course & teacher) . This is Multi
Valued Dependency

• We can eliminate the resulting redundancy by decomposing


CTB into CT and CB; each of these relations is then in 4NF.
As in this table MVD exists so this table is suffering from all anomalies.
So to solve this problem splitting table is required.
Course  Teacher
Course Book
TB_relation
CT_relation

So, This satisfies 4th Normal Form.


Before learning about 5NF you have to learn
about Join Dependency
Example
DEPT SUBJECT STUDENT
Dept student
Dept  subject CSE CS101 SHREYA
This table is not in 4NF , so first inorder
to be in 5Nf it should be in 4NF . So, IT IT501 YUG
let’s decompose
CSE CS102 RUTHVI

CSE CS103 RINI

ME ME201 SUSHANT

EC EC301 IRA
Decomposed in such a way that the tables are
in 4NF now:
DEPT SUBJECT STUDENT
DEPT SUBJECT DEPT STUDENT
CSE CS101 SHREYA

CSE CS101 CSE SHREYA CSE CS101 RUTHVI


CSE CS101 RINI
IT IT501 IT YUG IT IT501 YUG
CSE CS102 SHREYA
CSE CS102 CSE RUTHVI CSE CS102 RUTHVI
CSE CS102 RINI
CSE CS103 CSE RINI CSE CS103 SHREYA
CSE CS103 RUTHVI
ME ME201 ME SUSHANT
CSE CS103 RINI
ME ME201 SUSHANT
EC EC301 EC IRA
EC EC301 IRA
• Now, If we try to join these tables , does
DEPT SUBJECT STUDENT
this yield us the original relation ??
CSE CS101 SHREYA
• Let’s Check: On the basis of the common
column we try to recreate the original CSE CS101 RUTHVI
table. Let’s write a query for that CSE CS101 RINI
• Select * from dsub d1, dstu d2 where IT IT501 YUG
d1.dept = d2.dept; CSE CS102 SHREYA
• Acc, to the lossless property of join, CSE CS102 RUTHVI
Information neither should be lost nor CSE CS102 RINI
additive information should be there.
Here, we should be getting the original CSE CS103 SHREYA
table with 6 tuples but here we got more CSE CS103 RUTHVI
tuples than required and inconsistency is CSE CS103 RINI
been seen here
ME ME201 SUSHANT
• Redundant tuples have been performed .
EC EC301 IRA
• These are also known as Spurious tuples.
• By the resultant tuples we can say that
This problem is not in Join Dependency.
• Now to avoid this we need 5NF
Join Dependencies:

• Join decomposition is a further generalization of Multivalued dependencies. If the join


of R1 and R2 over C is equal to relation R then we can say that a join dependency (JD)
exists, where R1 and R2 are the decomposition R1(A, B, C) and R2(C, D) of a given
relations R (A, B, C, D).
• Alternatively, R1 and R2 are a lossless decomposition of R. A JD ⋈ {R1, R2, …, Rn} is
said to hold over a relation R if R1, R2, ….., Rn is a lossless-join decomposition.
• The *(A, B, C), (C, D) will be a JD of R if the join of attributes is equal to
the relation R.
• Here, *(R1, R2, R3) is used to indicate that relation R1, R2, R3 and so on are a JD of R.
• R  R1,R2,R3  (R1 ⋈ R2) ⋈ R3  SHOULD GIVE THE ORIGINAL RELATION R
Fifth Normal Form:

The 5NF (Fifth Normal Form) is also known as project-join normal form. A relation is in
Fifth Normal Form (5NF), if it is in 4NF, and won’t have lossless decomposition into
smaller tables.
In order to have the above example in 5NF decompose
again
DEPT SUBJECT STUDENT DEPT SUBJECT STUDENT

CSE CS101 SHREYA CSE CS101 SHREYA

IT IT501 YUG IT IT501 YUG

CSE CS102 RUTHVI CSE CS102 RUTHVI

CSE CS103 RINI CSE CS103 RINI

ME ME201 SUSHANT ME ME201 SUSHANT

EC EC301 IRA EC EC301 IRA


• Now, Write a query to join the above three tables taking Dept from d1, Student from D2, Subject from
D3
• Select d1.dept, d3.subject, d2.student from dsub d1, dstu d2, substu d3 where d1.dept=d2.dept and
d2.student=d3.student and d1.subject =d3.subject;
• Now the result of this query will be,
• Thus this decomposition solved Join dependency and removes spurious tuples from a relation

DEPT SUBJECT STUDENT

CSE CS101 SHREYA

IT IT501 YUG

CSE CS102 RUTHVI

CSE CS103 RINI

ME ME201 SUSHANT

EC EC301 IRA

You might also like