Professional Documents
Culture Documents
t1[α] t1[β]
α⊆R
t2[α] t2[β]
β⊆R ……. …….
• Then, for a functional dependency to exist from α to
β,
• If t1[α] = t2[α], then t1[β] = t2[β] f :α→β
d
Types Of Functional Dependencies
• Trivial Functional Dependencies
• Non-trivial Functional Dependencies
Examples-
Decomposition-
If A → BC, then A → B and A → C always holds.
Composition-
If A → B and C → D, then AC → BD always holds.
Additive-
If A → B and A → C, then A → BC always holds.
Rules for Functional Dependency
• R1
– A functional dependency X → Y will always hold if all the values of X
are unique (different) irrespective of the values of Y
A→B 5 4 3 2 2
A → BC 8 5 3 2 1
A → CD 1 9 3 3 5
4 7 3 3 8
A → BCD
A → DE
A → BCDE
In general, we can say following functional dependency will always
hold-
• R2: A functional dependency X → Y will always hold if all the values
of Y are same irrespective of the values of X.
• The following functional dependencies will always hold since all the
values of attribute ‘C’ are same-
A→C A B C D E
AB → C 5 4 3 2 2
ABDE → C 8 5 3 2 1
1 9 3 3 5
DE → C
4 7 3 3 8
AE → C
Step-02:
• Take the functional dependencies of set F into consideration.
• For each functional dependency X → Y, find the closure of X using the functional
dependencies of set F.
Step-03:
In DBMS,
• A canonical cover is a simplified and reduced version of the given set of functional
dependencies.
• Since it is a reduced version, it is also called as Irreducible set.
Characteristics-
• Canonical cover is free from all the extraneous functional dependencies.
• The closure of canonical cover is same as that of the given set of functional
dependencies.
• Canonical cover is not unique and may be more than one for a given set of functional
dependencies.
Need-
• Working with the set containing extraneous functional dependencies increases the
computation time.
• Therefore, the given set is reduced by eliminating the useless functional
dependencies.
• This reduces the computation time and working with the irreducible set becomes
Steps To Find Canonical Cover
Step-01:
• Write the given set of functional dependencies in such a way that each functional
dependency contains exactly one attribute on its right side.
Example-
The functional dependency X → YZ will be written as-
X→Y
X→Z
Step-02:
• Consider each functional dependency one by one from the set obtained in Step-
01.
• Determine whether it is essential or non-essential.
• If results come out to be same, It means that the presence or absence of that
functional dependency does not create any difference.
• Thus, it is non-essential.
• Eliminate that functional dependency from the set.
•
NOTE-
• Eliminate the non-essential functional dependency from the set as soon as it
is discovered.
• Do not consider it while checking the essentiality of other functional
dependencies.
Case-01: Results Come Out to be Different
• If results come out to be different,
• It means that the presence or absence of that functional dependency creates a
difference.
• Thus, it is essential.
• Do not eliminate that functional dependency from the set.
Step-03:Consider the newly obtained set of functional dependencies after performing Step-
02. Check if there is any functional dependency that contains more than one attribute on its
left side.
• There exists no functional dependency containing more than one attribute on its left
side.
• In this case, the set obtained in Step-02 is the canonical cover.
Case-01: Yes-
• There exists at least one functional dependency containing more than one attribute on
its left side.
• In this case, consider all such functional dependencies one by one.
• Check if their left side can be reduced.
Use the following steps to perform a check-
– Consider a functional dependency.
– Compute the closure of all the possible subsets of the left side of that functional dependency.
– If any of the subsets produce the same closure result as produced by the entire left side, then
replace the left side with that subset.
Problem-
• The following functional dependencies hold true for the relational scheme R ( W , X , Y , Z ) –
X → W, WZ → XY, Y → WXZ
• Write the irreducible equivalent for this set of functional dependencies.
Solution:
Step-01:
• Write all the functional dependencies such that each contains exactly one attribute on its right
side-
X → W, WZ → X, WZ → Y, Y → W, Y → X, Y → Z
Step-02:
Check the essentiality of each functional dependency one by one.
For X → W:
Now, Clearly, the two results are different. Thus, we conclude that X → W is essential and can not
be eliminated.
For WZ → X:
Now,
Clearly, the two results are same. Thus, we conclude that WZ → X is non-essential and can be eliminated.
For WZ → Y:
Now,
Clearly, the two results are different. Thus, we conclude that WZ → Y is essential and can not be eliminated.
For Y → W:
Now,
Clearly, the two results are same. Thus, we conclude that Y → W is non-essential and can be eliminated.
For Y → Z:
Now, Clearly, the two results are different. Thus, we conclude that Y → Z is
essential and can not be eliminated.
1 2 1
2 5 3
3 3 3
• This relation is not same as the original relation R and contains some
extraneous tuples.
• Clearly, R1 ⋈ R2 ⊃ R.
• Thus, we conclude that the above decomposition is lossy join decomposition.
• NOTE-
• Lossy join decomposition is also known as careless decomposition.
• This is because extraneous tuples get introduced in the natural join of the
sub-relations.
• Extraneous tuples make the identification of the original tuples difficult.
Determining Whether Decomposition Is Lossless Or Lossy-
• Consider a relation R is decomposed into two sub relations R1 and R2.
• Then, If all the following conditions satisfy, then the decomposition is
lossless.
• If any of these conditions fail, then the decomposition is lossy.
• Condition-01:
• Union of both the sub relations must contain all the attributes that are present
in the original relation R.
Thus, R1 ∪ R2 = R
Condition-02:
• Intersection of both the sub relations must not be null.
• In other words, there must be some common attribute which is present in both
the sub relations.
• Thus, R1 ∩ R2 ≠ ∅
• Condition-03:
• Intersection of both the sub relations must be a super key of either R1 or R2 or
both
DETERMINE WHETHER DECOMPOSITION IS LOSSLESS OR LOSSY
Condition-01:
According to condition-01, union of both the sub relations must contain all the attributes of
relation R’.
So, we have-
R1 ( A , B ) ∪ R 2 ( B , C )
= R’ ( A , B , C )
Clearly, union of the sub relations contain all the attributes of relation R’.
Thus, condition-01 satisfies.
Condition-02:
According to condition-02, intersection of both the sub relations must not be null.
So, we have-
R1 ( A , B ) ∩ R 2 ( B , C )
=B
Clearly, intersection of the sub relations is not null.
Condition-03:
• According to condition-03, intersection of both the sub relations must be the
super key of one of the two sub relations or both.
• So, we have-
• R1 ( A , B ) ∩ R2 ( B , C )= B
• Now, the closure of attribute B is-
• B+ = { B , C , D }
• Now, we see-
• Attribute ‘B’ can not determine attribute ‘A’ of sub relation R 1.
• Thus, it is not a super key of the sub relation R 1.
• Attribute ‘B’ can determine all the attributes of sub relation R 2.
• Thus, it is a super key of the sub relation R 2.
• Clearly, intersection of the sub relations is a super key of one of the sub relations
• So, condition-03 satisfies.
• Thus, we conclude that the decomposition is lossless.
• Conclusion-
• Overall decomposition of relation R into sub relations R 1, R2 and R3 is lossless.
Normalization
• In DBMS, database normalization is a process
of making the database consistent by
• Reducing the redundancies
• Ensuring the integrity of data through lossless
decomposition
• Normalization is done through normal forms.
• Information is stored redundantly
– Wastes storage
– Causes problems with update anomalies
• Insertion anomalies
• Deletion anomalies
• Modification anomalies
• Consider the relation:
– EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)
• Update Anomaly:
– Changing the name of project number P1 from “Billing”
to “Customer-Accounting” may cause this update to be
made for all 100 employees working on project P1.
Insertion and Deletion Anomaly
• Consider the relation:
– EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)
• Insert Anomaly:
– Cannot insert a project unless an employee is assigned to it.
• Conversely
– Cannot insert an employee unless an he/she is assigned to a
project.
• Consider the relation:
– EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)
• Delete Anomaly:
– When a project is deleted, it will result in deleting all the
employees who work on that project.
– Alternately, if an employee is the sole employee on a project,
deleting that employee would result in deleting the corresponding
Normal Forms
• First Normal Form (1NF)
• Second Normal Form (2NF)
• Third Normal Form (3NF)
• Boyce-Codd Normal Form (BCNF)
First Normal Form-
In other words,
• A → B is called a partial dependency if and only if-
• A is a subset of some candidate key
• B is a non-prime attribute.
• If any one condition fails, then it will not be a partial dependency.
NOTE-
• To avoid partial dependency, incomplete candidate key must not determine any
non-prime attribute.
Example
• Consider a relation- R ( V , W , X , Y , Z ) with functional
dependencies-
VW → XY Y→V WX → YZ
Now,
• It is clear that there are no non-prime attributes in the relation.
• In other words, all the attributes of relation are prime attributes.
• Thus, all the attributes on RHS of each functional dependency are prime
attributes.
Boyce-Codd Normal Form-
• A given relation is called in BCNF if and only if-
• Relation already exists in 3NF.
• For each non-trivial functional dependency A → B, A is a super key of the
relation.
Example-
P1
• A relation in BCNF will surely be in all other normal forms.
• A relation in 3NF will surely be in 2NF and 1NF.
• A relation in 2NF will surely be in 1NF.
P2
• BCNF is stricter than 3NF.
• 3NF is stricter than 2NF.
• 2NF is stricter than 1NF.
P3
• Start checking from BCNF.
• This is because if it is found to be in BCNF, then it will surely be in
all other normal forms.
• If the relation is not in BCNF, then start moving towards the outer
circles and check for other normal forms in the order they appear.
P4
• In a relational database, a relation is always in First Normal Form (1NF) at
least.
P5
• Singleton keys are those that consist of only a single attribute.
• If all the candidate keys of a relation are singleton candidate keys, then it
will always be in 2NF at least.
• This is because there will be no chances of existing any partial dependency.
• The candidate keys will either fully appear or fully disappear from the
dependencies.
• Thus, an incomplete candidate key will never determine a non-prime
attribute.
P6
• If all the attributes of a relation are prime attributes, then it will always be
in 2NF at least.
• This is because there will be no chances of existing any partial dependency.
• Since there are no non-prime attributes, there will be no Functional
Dependency which determines a non-prime attribute.
P7
• If all the attributes of a relation are prime attributes, then it will always be
in 3NF at least.
• This is because there will be no chances of existing any transitive
dependency for non-prime attributes.
P8
• Third Normal Form (3NF) is considered adequate for normal relational
database design.
P9
• Every binary relation (a relation with only two attributes) is always in
BCNF.
P10
• BCNF is free from redundancies arising out of functional dependencies
(zero redundancy).
P11:
• A relation with only trivial functional dependencies is always in BCNF.
• In other words, a relation with no non-trivial functional dependencies is
P12
– BCNF decomposition is always lossless but not always dependency
preserving.
P13
– Sometimes, going for BCNF may not preserve functional dependencies.
– So, go for BCNF only if the lost functional dependencies are not required
else normalize till 3NF only.
P14
– There exist many more normal forms even after BCNF like 4NF and more.
– But in the real world database systems, it is generally not required to go
beyond BCNF.
P15
– Lossy decomposition is not allowed in 2NF, 3NF and BCNF.
– So, if the decomposition of a relation has been done in such a way that it is
lossy, then the decomposition will never be in 2NF, 3NF and BCNF.
P16
– Unlike BCNF, Lossless and dependency preserving decomposition into
3NF and 2NF is always possible.
P17
– A prime attribute can be transitively dependent on a key in a 3NF
relation.
– A prime attribute can not be transitively dependent on a key in a
BCNF relation.
P18
– If a relation consists of only singleton candidate keys and it is in 3NF,
then it must also be in BCNF
P19
– If a relation consists of only one candidate key and it is in 3NF, then
the relation must also be in BCNF.
Multivalued Dependency
• Multivalued dependency occurs when two attributes in a table are independent of each other but,
both depend on a third attribute.
• A multivalued dependency consists of at least two attributes that are dependent on a third attribute
that's why it always requires at least three attributes.
• Example: Suppose there is a bike manufacturer company which produces two colors(white and
black) of each model every year
• Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent
of each other. BIKE_MODEL MANUF_YEAR COLOR
• these two columns can be called M2001 2008 White
• as multivalued dependent on BIKE_MODEL.
M2001 2008 Black
• The representation of these dependencies is
M3001 2013 White
• shown below:
1. BIKE_MODEL → → MANUF_YEAR M3001 2013 2013
2. BIKE_MODEL → → COLOR M4006 2017 White
M4006 2017 Black
• The given STUDENT table is in 3NF, but the COURSE and HOBBY are two
independententity. Hence, there is no relationship between COURSE and HOBBY.
• In the STUDENT relation, a student with STU_ID, 21 contains two
courses, Computer and Math and two hobbies, Dancing and Singing. So there is a
Multi-valued dependency on STU_ID, which leads to unnecessary repetition of data.
• So to make the above table into 4NF, we can decompose it into two tables:
• Student_course
Multivalued Dependencies and Fourth
Normal Form
(a)The EMP relation with two MVDs: ENAME —>>
PNAME and ENAME —>> DNAME.
(b)Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS.
(c) The relation SUPPLY with no MVDs is in 4NF but
not in 5NF if it has the JD(R1, R2, R3). (d)
Decomposing the relation SUPPLY into the 5NF
relations R1, R2, and R3.
Multivalued Dependency (MVD)
Definition:
• A multivalued dependency (MVD) X —>> Y specified on relation schema R,
where X and Y are both subsets of R, specifies the following constraint on any
relation state r of R: If two tuples t1 and t2 exist in r such that t1[X] = t2[X], then
two tuples t3 and t4 should also exist in r with the following properties, where we
use Z to denote (R 2 (X υ Y)):
– t3[X] = t4[X] = t1[X] = t2[X].
– t3[Y] = t1[Y] and t4[Y] = t2[Y].
– t3[Z] = t2[Z] and t4[Z] = t1[Z].
• An MVD X —>> Y in R is called a trivial MVD if (a) Y is a subset of X, or (b) X
υ Y = R.
4NF
Definition:
• A relation schema R is in 4NF with respect to a set of dependencies F (that
includes functional dependencies and multivalued dependencies) if, for every
nontrivial multivalued dependency X —>> Y in F+, X is a superkey for R.
– Note: F+ is the (complete) set of all dependencies (functional or multivalued)
that will hold in every relation state r of R that satisfies F. It is also called the
closure of F.
Lossless (Non-additive) Join Decomposition into
4NF Relations:
• PROPERTY LJ1’
– The relation schemas R1 and R2 form a lossless (non-additive)
join decomposition of R with respect to a set F of functional and
multivalued dependencies if and only if
• (R1 ∩ R2) —>> (R1 - R2)
– or by symmetry, if and only if
• (R1 ∩ R2) —>> (R2 - R1)).
Join Dependency
Definition:
• A join dependency (JD), denoted by JD(R1, R2, ..., Rn), specified on
relation schema R, specifies a constraint on the states r of R.
– The constraint states that every legal state r of R should have a non-
additive join decomposition into R1, R2, ..., Rn; that is, for every such r
we have
– * (ΠR1(r), Π R2(r), ..., Π Rn(r)) = r
Note: an MVD is a special case of a JD where n = 2.
• A join dependency JD(R1, R2, ..., Rn), specified on relation schema
R, is a trivial JD if one of the relation schemas Ri in JD(R1, R2, ...,
Rn) is equal to R.
Definition:
• A relation schema R is in fifth normal form (5NF) (or Project-Join Normal
Form (PJNF)) with respect to a set F of functional, multivalued, and join
dependencies if,
– for every nontrivial join dependency JD(R1, R2, ..., Rn) in F+ (that is, implied
by F),
• every Ri is a superkey of R.
INCLUSION DEPENDENCIES
Inclusion dependencies were defined in order to formalize two types of inter relational
constraints:
• The foreign key constraint cannot be specified as a functional or multivalued dependency
because it relates attributes across relations.
• The constraint between two relations that represent a class/subclass relationship also has no
formal definition in terms of the functional, multivalued, and join dependencies.
Definition. An inclusion dependency R.X < S.Y between two sets of attributes-X of relation
schema R, and Y of relation schema S-specifies the constraint that, at any specific time when
r is a relation state of R and s a relation state of S, we must have