Professional Documents
Culture Documents
Systems
Program & Semester: B.Tech & III SEM
AI&DS
Academic Year: 2023 - 24
U N I T III
Normalization
Following determine the quality of relation schema design:
Making sure that the semantics of the attributes is clear in the schema
Reducing the redundant information in tuples
Reducing the NULL values in tuples
Disallowing the possibility of generating spurious tuples.
Functional Dependency
A functional dependency, denoted by X → Y, between two sets of attributes X and Y
that are subsets of R specifies a constraint on the possible tuples that can form a
relation state r of R. The constraint is that, for any two tuples t1 and t2 in r that have
t1[X] = t2[X], they must also have t1[Y] = t2[Y].
Reflexivity rule
If α is a set of attributes and β ⊆ α, then α →β holds.
Augmentation rule
If α → β holds and γ is a set of attributes, then γα → γβ holds.
Transitivity rule
If α →β holds and β → γ holds, then α → γ holds.
Union rule
If α → β holds and α → γ holds, then α →βγ holds.
Decomposition rule
If α →βγ holds, then α → β holds and α →γ holds.
Pseudo transitivity rule
If α→β holds and γβ →δ holds, then αγ →δ holds.
Let us apply our rules to the example of schema R = (A, B, C, G, H, I) and the
set F of functional dependencies {A → B, A → C, CG → H, CG → I, B →
H}.
For example:
Consider a table with two columns Student_id and Student_Name.
{Student_Id, Student_Name} -> Student_Id is a trivial functional dependency as
Student_Id is a subset of {Student_Id, Student_Name}.
Also, Student_Id -> Student_Id & Student_Name -> Student_Name are trivial
dependencies too.
Non-trivial functional dependency
If a functional dependency X->Y holds true where Y is not a subset of X then
this dependency is called non -trivial Functional dependency.
For example:
An employee table with three attributes: emp_id, emp_name, emp_address.
The
following functional dependencies are non-trivial:
emp_id -> emp_name (emp_name is not a subset of emp_id)
emp_id -> emp_address (emp_address is not a subset of emp_id)
Multivalued dependency:
MVD or multivalued dependency means that for a single value of attribute ‘a’
multiple values of attribute ‘b’ exist. Multivalued dependency occurs when
there are more than one independent multivalued attributes in a table.
For example:
Consider a bike manufacture company, which produces two colors (Black and
white) in each model every year.
bike_model manuf_year color
M1001 2007 Black
M1001 2007 Red
M2012 2008 Black
M2012 2008 Red
M2222 2009 Black
M2222 2009 Red
Here columns manuf_year and color are independent of each other and
dependent on bike_model. In this case these two columns are said to be
multivalued dependent on bike_model. These dependencies can be represented
like this:
bike_model ->-> manuf_year
bike_model ->-> color
Transitive dependency:
A functional dependency is said to be transitive if it is indirectly formed by
two functional dependencies.
X -> Z is a transitive dependency if the following three functional
dependencies hold true:
⚫ X->Y
⚫ Y does not ->X
⚫ Y->Z
Example: Let’s take an example to understand it better:
Book Author Author_age
Game of Thrones George R. R. Martin 66
Harry Potter J. K. Rowling 49
Dying of the Light George R. R. Marti 66
{Book} ->{Author} (if we know the book, we knows the author name)
{Author} does not ->{Book}
{Author} -> {Author_age}
Update anomaly:
In the above table we have two rows for employee Rick as he belongs to two
departments of the company. If we want to update the address of Rick then we
have to update the same in two rows or the data will become inconsistent. If
somehow, the correct address gets updated in one department but not in other then
as per the database, Rick would be having two different addresses, which is not
correct and would lead to inconsistent data.
Insert anomaly:
Suppose a new employee joins the company, who is under training and currently
not assigned to any department then we would not be able to insert the data into
the table if emp_dept field doesn’t allow nulls.
Delete anomaly: Suppose, if at a point of time the company closes the
department D890 then deleting the rows that are having emp_dept as D890 would
also delete the information of employee Maggie since she is assigned only to this
department.
To overcome these anomalies, we need to normalize the data.
Normalization
Normalization is a process of organizing the data in database for avoiding data
redundancy which solves the problem of insertion anomaly, update anomaly &
deletion anomaly.
The table is in 1 NF because each attribute has atomic values. However, it is not in
2NF because non-prime attribute teacher_age is dependent on teacher_id alone
which is a proper subset of candidate key. i.e., teacher_id → teacher_age.
This violates the rule for 2NF as the rule says “no non-prime attribute is
dependent on the proper subset of any candidate key of the table”.
Third Normal Form (3NF)
According to Codd’s original definition, a relation schema R is in 3NF if it
satisfies 2NF and no nonprime attribute of R is transitively dependent on the
primary key (it should not have transitive dependencies).
A relation is in 3NF if at least one of the following condition holds in every non-
trivial function dependency X –> Y
X is a super key.
Y is a prime attribute (each element of Y is part of some candidate key)
emp_distri
emp_id emp_name emp_zip emp_state emp_city
c t
To make this table complies with 3NF we have to break the table into two tables to
remove the transitive dependency:
Employee table
emp_id emp_name emp_zip
1001 John 282005
1002 Ajeet 222008
1006 Lora 282007
1101 Lilly 292008
1201 Steve 222999
Employee_zip
264 India
364 UK
264 283
264 300
364 232
364 549
From BCD we can derive BC and BD by using decomposition rule.
Let us consider AB and BC. From these two functional dependencies we
can derive AC by applying Transitivity rule.
From AB and BD we can derive AD.
So, final F+ = {ABC, BCD, AB, AC, BC, BD, AC,
AD}.
Attribute Closure
Let XY be the Functional Dependency. Attribute closure of X denoted
with X+ with respect to F is the set of all attributes A such that XA can
be inferred using Armstrong Axioms.
Algorithm
Closure= X;
Repeat until there is no change in closure
{
If there is an FD UV in F
Such that U ⊆ Closure,
Then set Closure=
Closure 𝖴 V.
}
ABC, BCD.
Super Key
Let R be the relational schema, and X be the set of attributes over R. If X+
determine all the attributes of R, then X is said to be super key of R.
Problem 1
Given a relation R( A, B, C, D) and Functional Dependency set FD = { AB →
CD, B → C }, determine whether the given R is in 2NF? If not convert it into 2
NF.
Problem 2
Given a relation R( P, Q, R, S, T) and Functional Dependency set FD = { PQ → R,
S → T }, determine whether the given R is in 2NF? If not convert it into 2 NF.
Problem 3
Data Redundancy
Data redundancy in database means that some data fields are repeated in the
database. This data repetition may occur either if a field is repeated in two or
more tables or if the field is repeated within the table.
Disadvantages of data redundancy
1. Increases the size of the database unnecessarily.
2. Cause’s data inconsistency.
3. Decrease’s efficiency of database.
4. May cause data corruption.
Decomposition:
Decomposition is the process of dividing the normal form into tables for remove
the anamoly. Decomposition makes easy to find the data in database. It removes
the inconsistency, duplication. Decomposition means replacing a relation with a
collection of smaller relations.
Properties of Decomposition: While decomposing a relation R into R1, R2… Rn.
The decomposition must satisfy the following properties. Those are
R( A , B , C )
A B C
1 2 1
2 5 3
3 3 3
Consider this relation is decomposed into two sub relations R1( A , B ) and R2( B , C )-
R1( A , B ) R2( B , C )
A B B C
1 2 2 1
2 5 5 3
3 3 3 3
The dependency preservation decomposition is another property of decomposed
relational database schema D in which each functional dependency X -> Y
specified in F either appeared directly in one of the relation schemas Ri in the
decomposed D or could be inferred from the dependencies that appear in some Ri.
Step 2.
Prime attribute are those attribute which are part of candidate key {A, C} in this
example and others will be non-prime {B, D, E} in this example.
Step 3.
The relation R is in 1st normal form as a relational DBMS does not allow multi-
valued or composite attribute.
The relation is in 2nd normal form because BC->D is in 2nd normal form (BC is
not proper subset of candidate key AC) and AC->BE is in 2nd normal form (AC is
candidate key) and B->E is in 2nd normal form (B is not a proper subset of
candidate key AC).
The relation is not in 3rd normal form because in BC->D (neither BC is a super
key nor D is a prime attribute) and in B->E (neither B is a super key nor E is a
prime attribute) but to satisfy 3rd normal for, either LHS of an FD should be super
key or RHS should be prime attribute.
Example
Consider the below schema, “if a company makes a product and an agent is an agent
for that company, then he always sells that product for the company”. Under these
circumstances, the ACP table is shown as:
The relation ACP is again decomposed into 3 relations. Now, the natural Join
of all the three relations will be shown as:
Result of natural join of R1 and R3 over ‘Company’ and then natural join of
R13
and R2 over ‘Agent’ and ‘Product’ is given below.
Hence, in this example, all the redundancies are eliminated, and the
decomposition of ACP is a lossless join decomposition. Hence the relation is in
5NF as it does not violate the property of lossless join.
Inclusion Dependencies
It is a statement of the form that some columns of a relation contained in
another column.
Example: Foreign Key.
We should not split groups of attributes that participate in an inclusion
dependency.
For example AB is inclusion dependent on CD that means AB ⊆ CD. While
decomposing the above schema contains AB we should ensure that at least one of
the schema’s obtained in that decomposition contains both A and B.
Most of the inclusion dependencies are key based.
In E-R diagram IS A hierarchies also leads to key-based
inclusion dependencies.