You are on page 1of 57

UNIT 3

Normalization

Normalization is the process of organizing the data in


the database.
Normalization is used to minimize the redundancy from
a relation or set of relations. It is also used to eliminate
the undesirable characteristics like Insertion, Update
and Deletion Anomalies.
Normalization divides the larger table into the smaller
table and links them using relationship.
The normal form is used to reduce redundancy from
the database table.
Keys in DBMS

• KEYS in DBMS is an attribute or set of attributes which helps you to identify a


row(tuple) in a relation(table). They allow you to find the relation between two
tables. Keys help you uniquely identify a row in a table by a combination of one or
more columns in that table. Key is also helpful for finding unique record or row
from the table. Database key is also helpful for finding unique record or row from
the table.
• Why we need a Key?
• Here are some reasons for using sql key in the DBMS system.
•  Keys help you to identify any row of data in a table. In a real-world application, a
table could contain thousands of records. Moreover, the records could be
duplicated. Keys ensure that you can uniquely identify a table record despite these
challenges.
•  Allows you to establish a relationship between and identify the relation between
tables
•  Help you to enforce identity and integrity in the relationship.
What is a Primary Key?
PRIMARY KEY is a column or group of columns in
a table that uniquely identify every row in that table.
The Primary Key can't be a duplicate meaning the
same value can't appear more than once in the table. A
table cannot have more than one primary key.
Rules for defining Primary key:
 Two rows can't have the same primary key value
 It must for every row to have a primary key value.
 The primary key field cannot be null.
 The value in a primary key column can never be
modified or updated if any foreign key refers to that
primary key.
• ALTERNATE KEYS is a column or group of columns in a table that
uniquely identify every row in that table. A table can have multiple
choices for a primary key but only one can be set as the primary key.
All the keys which are not primary key are called an Alternate Key.
• CANDIDATE KEY is a set of attributes that uniquely identify tuples in
a table. Candidate Key is a super key with no repeated attributes.
The Primary key should be selected from the candidate keys. Every
table must have at least a single candidate key. A table can have
multiple candidate keys but only a single primary key.
• Properties of Candidate key:
•  It must contain unique values
•  Candidate key may have multiple attributes
•  Must not contain null values
•  It should contain minimum fields to ensure uniqueness
•  Uniquely identify each record in a table
• FOREIGN KEY is a column that creates a relationship
between two tables. The purpose of Foreign keys is
to maintain data integrity and allow navigation
between two different instances of an entity. It acts
as a cross-reference between two tables as it
references the primary key of another table.
• COMPOSITE KEY is a combination of two or more
columns that uniquely identify rows in a table. The
combination of columns guarantees uniqueness,
though individually uniqueness is not guaranteed.
Hence, they are combined to uniquely identify
records in a table.
Codd’s 12 rules
• Dr. E.F.Codd published a list of 12 rules to define
relational databases in 1985. (numbered zero to twelve) 

• Rule Zero
• The system must qualify as relational, as a database, and
as a management system. For a system to qualify as a
relational database management system (RDBMS), that
system must use its relational facilities (exclusively) to
manage the database.
• The other 12 rules derive from this rule. The rules are as
follows :
Rule 1: Information rule
All information(including metadata) is to be represented as
stored data in cells of tables. The rows and columns have to
be strictly unordered.

Rule 2: Guaranted Access


Each unique piece of data(atomic value) should be accesible
by : Table Name + Primary Key(Row) + Attribute(column).
NOTE: Ability to directly access via POINTER is a violation of
this rule.
Rule 3: Systematic treatment of NULL
Null has several meanings, it can mean missing data, not
applicable or no value. It should be handled consistently.
Also, Primary key must not be null, ever. Expression
on NULL must give null.
• Rule 4: Active Online Catalog
• Database dictionary(catalog) is the structure description of
the complete Database and it must be stored online. The
Catalog must be governed by same rules as rest of the
database. The same query language should be used on
catalog as used to query database.

• Rule 5: Powerful and Well-Structured Language


• One well structured language must be there to provide all
manners of access to the data stored in the database.
Example: SQL, etc. If the database allows access to the data
without the use of this language, then that is a violation.
• Rule 6: View Updation Rule
• All the view that are theoretically updatable
should be updatable by the system as well.
• Rule 7: Relational Level Operation
• There must be Insert, Delete, Update
operations at each level of relations. Set
operation like Union, Intersection and minus
should also be supported.
• Rule 8: Physical Data Independence
• The physical storage of data should not matter to the
system. If say, some file supporting table is renamed or
moved from one disk to another, it should not effect the
application.
• Rule 9: Logical Data Independence
• If there is change in the logical structure(table structures)
of the database the user view of data should not change.
Say, if a table is split into two tables, a new view should
give result as the join of the two tables. This rule is most
difficult to satisfy.
• Rule 10: Integrity Independence
• The database should be able to enforce its own integrity
rather than using other programs. Key and Check
constraints, trigger etc, should be stored in Data Dictionary.
This also make RDBMS independent of front-end.

• Rule 11: Distribution Independence


• A database should work properly regardless of its
distribution across a network. Even if a database is
geographically distributed, with data stored in pieces, the
end user should get an impression that it is stored at the
same place. This lays the foundation of distributed
database.
• Rule 12: Nonsubversion Rule
• If low level access is allowed to a system it
should not be able to subvert or bypass
integrity rules to change the data. This can be
achieved by some sort of looking or
encryption. for example, bypassing a relational
security or integrity constraint.
Functional dependency

• Functional dependency is a relationship that exists


when one attribute uniquely determines another
attribute.
• If R is a relation with attributes X and Y, a functional
dependency between the attributes is represented
as X->Y, which specifies Y is functionally dependent
on X. Here X is a determinant set and Y is a
dependent attribute. Each value of X is associated
with precisely one Y value.
• Functional Dependency is when one attribute
determines another attribute in a DBMS
system. Functional Dependency plays a vital
role to find the difference between good and
bad database design.
In this example, if we know the value of Employee number, we can obtain
Employee Name, city, salary, etc.
By this, we can say that the city, Employee Name, and salary are
functionally depended on Employee number.

employee Employee Name Salary City


number

1 Dana 50000 San Francisco

2 Francis 38000 London

3 Andrew 25000 Tokyo


Types of Functional Dependencies

• Trivial functional dependency


• Non-Trivial functional dependency
• Multivalued functional dependency
• Transitive functional dependency
• Fully Functional Dependency
Trivial Functional Dependency

• In Trivial functional dependency, a dependent


is always a subset of the determinant. In other
words, a functional dependency is called trivial
if the attributes on the right side are the
subset of the attributes on the left side of the
functional dependency.
• X → Y is called a trivial functional dependency
if Y is the subset of X.
Employee_Id Name Age

1 Zayn 24

2 Phobe 34

3 Hikki 26

4 David 29

 Here, { Employee_Id, Name } → { Name } is a Trivial functional


dependency, since the dependent Name is the subset of
determinant { Employee_Id, Name }.
 { Employee_Id } → { Employee_Id }, { Name } → { Name } and { Age
} → { Age } are also Trivial.
• Non-Trivial Functional Dependency
• It is the opposite of Trivial functional dependency.
Formally speaking, in Non-Trivial functional
dependency, dependent if not a subset of the
determinant.
• X → Y is called a Non-trivial functional dependency
if Y is not a subset of X. So, a functional dependency X →
Y where X is a set of attributes and Y is also a set of the
attribute but not a subset of X, then it is called Non-
trivial functional dependency.
• For example, consider the Employee table

 
• Here, { Employee_Id } → { Name } is a non-
trivial functional dependency
because Name(dependent) is not a
subset of Employee_Id(determinant).
• Similarly, { Employee_Id, Name } → { Age } is
also a non-trivial functional dependency
• Multivalued Functional Dependency
• IN Multivalued functional dependency, attributes in the
dependent set are not dependent on each other.
• For example, X → { Y, Z }, if there exists is no functional
dependency between Y and Z, then it is called
as Multivalued functional dependency.
• For example, consider the Employee table below.
•  
• Here, { Employee_Id } → { Name, Age } is a Multivalued
functional dependency, since the dependent
attributes Name, Age are not functionally dependent(i.e.
Name → Age or Age → Name doesn’t exist !).
• Transitive Functional Dependency
• Consider two functional dependencies A →
B and B → C then according to the transitivity
axiom A → C must also exist. This is called a
transitive functional dependency.
• In other words, dependent is indirectly
dependent on determinant in Transitive
functional dependency.
• For example, consider the Employee table
below.
Employee_Id Name Department Street Number
1 Zayn CD 11
2 Phobe AB 24
3 Hikki CD 11
4 David PQ 71
5 Phobe LM 21

•Here, { Employee_Id →
Department } and { Department → Street
Number } holds true. Hence, according to the axiom of
transitivity, { Employee_Id → Street Number } is a valid
functional dependency.
• Fully Functional Dependency
• It occurs only in a relation with composite
primary key. It occurs when one or more non
key attributes are depending on all parts of the
composite key.
• Partial Dependency
• Partial Dependency occurs when a nonprime
attribute is functionally dependent on part of a
candidate key.
• The 2nd Normal Form (2NF) eliminates the
Partial Dependency.
<StudentProject>
The prime key attributes are StudentID and ProjectNo.
As stated, the non-prime
attributes .e. StudentName and ProjectName 
should be functionally dependent on part of a candidate key, to be
Partial Dependent.
The StudentName can be determined by StudentID that makes
the relation Partial Dependent.
The ProjectName can be determined by ProjectID,
which that the relation Partial Dependent.

StudentID ProjectNo StudentName ProjectName

S01 199 Katie Geo Location

S02 120 Ollie Cluster


Exploration
Transitive dependency:
When an indirect relationship causes functional dependency it is called Transitive
Dependency.
If  P -> Q and Q -> R is true, then P-> R is a transitive dependency.
Example:
{Company} -> {CEO} (if we know the compay, we know its CEO's name)
{CEO } -> {Age} If we know the CEO, we know the Age
Therefore according to the rule of rule of transitive dependency:
{ Company} -> {Age} should hold, that makes sense because if we know the company name,
we can know his age.
Note: You need to remember that transitive dependency can only occur in a relation of three
or more attributes.

Company CEO Age


Microsoft Satya Nadella 51
Google Sundar Pichai 46
Alibaba Jack Ma 54
The Normalization Process
What is Decomposition?
• Decomposition – the process of breaking
down in parts or elements.

• Decomposition in database means breaking


tables down into multiple tables

• From Database perspective means going to a


higher normal form
• It replaces a relation with a collection of smaller relations.
• It breaks the table into multiple tables in a database.
• It should always be lossless, because it confirms that the
information in the original relation can be accurately
reconstructed based on the decomposed relations.
• If there is no proper decomposition of the relation, then it
may lead to problems like loss of information.
• Decomposition in DBMS removes redundancy, anomalies
and inconsistencies from a database by dividing the table
into multiple tables.
There are two types of decomposition :
• Lossless Decomposition
• Lossy Decomposition

• Lossless Decomposition: Decomposition is


lossless if it is feasible to reconstruct relation R
from decomposed tables using Joins. This is
the preferred choice. The information will not
lose from the relation when decomposed. The
join would result in the same original relation.
Lossless Decomposition
A decomposition is lossless if we can recover:
R(A,B,C)

Decompos
e

R1(A,B) R2(A,C)

Recove
r
R’(A,B,C) should be the same as
R(A,B,C)
Must ensure R’ = R
• Lossless Join Decomposition :
• "The decomposition of relation R into R1 and
R2 is lossless when the join of R1 and R2  yield
the same relation as in R."
STUDENT :
 

Roll_no Sname Dept

111 parimal COMPUTER

222 parimal ELECTRICAL

Stu_name:                                                                                        Stu_dept :

Roll_no Sname Roll_no Dept


111 parimal 111 COMPUTER
222 parimal 222 ELECTRICAL
<EmpInfo>

Emp_ID Emp_Name Emp_Age Emp_Location Dept_ID Dept_Nam


e

E001 Jacob 29 Alabama Dpt1 Operations

E002 Henry 32 Alabama Dpt2 HR

E003 Tom 22 Texas Dpt3 Finance

<EmpDetails>

Emp_ID Emp_Name Emp_ Emp_Loc


Age ation Dept_ID Emp_ID Dept_Name

E001 Jacob 29 Alabama Dpt1 E001 Operations

E002 Henry 32 Alabama Dpt2 E002 HR

E003 Tom 22 Texas Dpt3 E003 Finance


<EmpDetails>

Emp_ID Emp_Name Emp_Age Emp_Location

E001 Jacob 29 Alabama

E002 Henry 32 Alabama

E003 Tom 22 Texas

<DeptDetails>

Dept_ID Dept_Name

Dpt1 Operations

Dpt2 HR

Dpt3 Finance

Now, you won’t be able to join the above tables, since Emp_ID isn’t part of
the DeptDetails relation.
Conclusion
Decomposing is the act of breaking tables down in
order to achieve higher normal form.

Decompositions should always be lossless.


• This confirms that information in the original relation
can be accurately reconstructed based on the
decomposed relations.

Remember that for a decomposition to be considered


“GOOD” it must also preserve functional dependencies.
Normalization
• Normalization is a method of organizing the data in
the database which helps you to avoid data
redundancy, insertion, update & deletion anomaly. It
is a process of analyzing the relation schemas based
on their different functional dependencies and
primary key.
• Normalization is inherent to relational database
theory. It may have the effect of duplicating the
same data within the database which may result in
the creation of additional tables.
• Anomalies in DBMS
• Anomalies in DBMS are caused when there is
too much redundancy in the database’s
information. Anomalies can often be caused
when the tables that make up the database
suffer from poor construction.
• There are three types of anomalies that occur
when the database is not normalized. These are
– Insertion, update and deletion anomaly
Worker_id Worker_name Worker_dept Worker_address

65 Ramesh ECT001 Jaipur

65 Ramesh ECT002 Jaipur

73 Amit ECT002 Delhi

76 Vikas ECT501 Pune

76 Vikas ECT502 Pune

79 Rajesh ECT669 Mumbai


Objective of Normalization
To create a table that have the following characteristics

1. Each table represent a single subject. Example A course table


will have only data related to course only and student table will
have data related to student only.

2. No data item will be unnecessarily stored in more than one


table. The reason for this requirement is to ensure that data are
updated in only one place.

3. All attributes in a table are dependent on the primary key


Normal Forms
NORMAL FORMS CHARACTERISTICS
FISRT NORMAL FORM (1NF) NO REPEATING GROUPS AND
PRIMARY KEY IDENTIFIED.

SECOND NORMAL FORM (2NF) 1NF AND NO PARTIAL DEPENDENCY

THIRD NORMAL FORM ( 3NF) 2NF AND NO TRASITIVE


DEPENDENCY

BOYCE-CODD NORMAL FORM EVERY DETERMINANT IS


(BCNF) CANDIDATE KEY

FOURTH NORMAL FORM ( 4NF) 3NF AND NO INDEPENDENT MULTI


VLAUED DEPENDENCY.
1 NF
st

• It should only have single(atomic) valued


attributes/columns.
• Values stored in a column should be of the
same domain
• All the columns in a table should have unique
names.
• NO REPEATING GROUPS AND PRIMARY KEY
IDENTIFIED.
UN-NORMALIZED TABLE
PRO PROJ_NAME EMP_NU EMP_NA JOB_CLASS CHG_HOUR HOURS
J_N M ME ($)
UM
15 Evergreen 103 Ramesh Manager 100.00 24,40
101 Mohan DBA 150 40 , 50
105 Mayank DBA 150 50
106 William Programmer 40 60
102 David Analyst 200 40
16 Amber Wave 114 James General support 18 100

118 Alice DDS analyst 300 100

104 Anne System analyst 105 40

22 Star flight 107 Maria Programmer 40 30


111 John DBA 150 50
113 James DSS analyst 300 50
Conversion to INF
This is done in three steps
1. Eliminate the repeating groups:-Eliminate repeating
group means ,eliminate the null by making sure the
each repeating group contain an appropriate data.
2. identify the primary key:- PROJ_NUM+EMP_NUM
3. identify all dependencies:-
1NF
PROJ_NUM,EMP_NUM-PROJ_NAME,EMP_NAME,JOB_CLASS,CHG_HOURS,HOURS.
PROJ_NUM-- PROJ_NAME.
EMP_NUM-EMP_NAME,JOB_C LASS,CHG_HOURS.
JOB_CLASS--CHG-HOUR
PROJ PROJ_NAME EMP_ EMP_NA JOB_CLASS CHG_HOUR( HOURS
_NU NUM ME $)
M

15 Evergreen 103 Ramesh Manager 100.00 24

15 Evergreen 101 Mohan DBA 150 40

15 Evergreen 114 James General support 145 100

15 Evergreen 106 William Programmer 40 60

15 Evergreen 102 David Analyst 200 40

16 Amber Wave 114 James General support 18 100

16 Amber Wave 118 Alice DDS analyst 300 100

16 Amber Wave 103 Ramesh Manager 100.00 24

22 Star flight 107 Maria Programmer 40 30

22 Starflight 111 John DBA 150 50


Characteristics of 1NF
• All key attributes are defined.
• There is no repeating groups in the table. In
other words each row/column intersection
contain one and only one value.
• All attribute are dependent on the primary
key.
2NF
• For a table to be in the Second Normal Form,
it must satisfy two conditions:
• The table should be in the First Normal Form.
• There should be no Partial Dependency.
IINF

2NF
PROJ_NUM,EMP_NUM-PROJ_NAME,EMP_NAME,JOB_CLASS,CHG_HOURS,HOURS.

PARTIAL DEPENDENCY:- A dependency based on only part of composite


primary key is called partial dependency.

Example
PROJ_NUM PROJ_NAME.
EMP_NUMEMP_NAME,JOB_C LASS,CHG_HOURS.
.
Conversion to 2NF
• There are only two steps to convert 1NF to 2 NF.
1. Write Each key component on separate line
A. PROJ_NUM
B. EMP_NUM
C. PROJ_NUM EMP_NUM
Here each component will become the key in a
new table so original table will be divided in to
three tables
PROJECT,
EMPLOYEE,
ASSIGNMENT
Conversion to 2NF CONTD.
2. Assign corresponding dependent attributes

PROJECT(PROJ_NUM,PROJ_NAME)

EMPLOYEE(EMP_NUM,EMP_NAME,JOB_CLASS,CHG_HOURS)

ASSIGNMENT(PROJ_NUM,EMP_NUM,ASSING_HOURS)
Characteristics of 2NF
1. Table must be in 1 Normal Form
2. it includes no partial dependency; that is no
attribute is dependent on only one portion of
primary key.

• To remove Partial dependency, we can divide the


table, remove the attribute which is causing
partial dependency, and move it to some other
table where it fits in well.
3NF
• It should be in the Second Normal form.
• And it should not have Transitive Dependency.

TRANSITIVE DEPEDENCY:-Transitive dependency is a dependency of one


nonprime attribute on another nonprime attribute is called transitive
dependency.
example
JOB_CLASSCHG_HOUR
Conversion to 3NF
• This is done in three steps.
Step 1. identify Each New Determinant:-Determinant:- a
determinant is any attribute whose value determines
other values with in a row. For example JOB_CLASS.

Step 2.identify dependent Attributes:-indentify the attributes


that are dependent on determinant identified in step 1.
Example in this case is
JOB_CLASSCHG_HOURS.
this will be a new table we may call JOB.
Conversion to 3NF contd.

Step 3:-Remove the Dependent Attribute from


transitive Dependency.
In this case we will eliminate CHG_HOUR from
employee table. So new definition of employee
table will be
EMP_NUMEMP_NAME,JOB_CLASS.
job_class will remain with Employee and will
serve as Foreign Key(FK).
Conversion to 3NF contd

• After this process we will have 4 tables

1. PROJECT(PROJ_NUM,PROJ_NAME)

2. EMPLOYEE(EMP_NUM,EMP_NAME,JOB_CLASS)

3. JOB(JOB_CLASS,CHG_HOUR).

4. ASSINGMENT(PROJ_NUM,EMP_NUM,ASSIGN_HOURS)
Characteristics of 3NF
• It is in 2NF.
• In contain no transitive dependencies.

You might also like