You are on page 1of 65

NORMALIZATION MUHAMMAD HAMIZ MOHD RADZI

CONTENT
Database Tables and Normalization
The Need for Normalization
Normalization Process
Higher Level Normal Form (BCNF)
Normalization and Database Design
Denormalization

MUHAMMAD HAMIZ MOHD RADZI


OBJECTIVE
At the end of this lesson, you should be able to:

Describe database design and normalization


Explain the need for normalization
Do step by step normalization process
Do the Boyce-Codd Normal Form
Explain the denormalization process

MUHAMMAD HAMIZ MOHD RADZI


NORMALIZATION
 When we design a database for an enterprise, the main objective is to
create an accurate representation of the data, relationships between the
data, and constraints on the data that is pertinent to the enterprise.
 To help achieve this objective, we can use one or more database design
techniques and one of it is the normalization.
 Normalization is the technique for producing a set of relations with
desirable properties, given the data requirements of an enterprise.
 The purpose of normalization is to identify a suitable set of relations that
support the data requirements of an enterprise.

MUHAMMAD HAMIZ MOHD RADZI


 The characteristics of a suitable set of relations include the following:

the minimal number of attributes necessary to support the data requirements of


the enterprise

attributes with a close logical relationship (described as functional dependency)


are found in the same relation;

minimal redundancy with each attribute represented only once with the important
exception of attributes that form all or part of foreign keys, which are essential for
the joining of related relations.

 The benefits of using a database that has a suitable set of relations is that
the database will be easier for the user to access and maintain the data,
and take up minimal storage

MUHAMMAD HAMIZ MOHD RADZI


ANOMALIES
Insert anomaly: occurs when extra data beyond the desired data must be added to
the database. Eg: Salesperson cannot be inserted unless he/she is assigned to a
customer.

Update anomaly: exists when one or more instances of duplicated data is updated,
but not all. Eg: there are 3 duplicated records in database, but only one is updated
with the current data.

Delete anomaly: exists when certain attributes are lost because of the deletion of
other attributes. Eg: if the data is the only information left in the database, once it is
deleted, the information will be forever gone.

MUHAMMAD HAMIZ MOHD RADZI


FUNCTIONAL DEPENDENCIES
 An important concept associated with normalization is functional
dependency, which describes the relationship between attributes.

 Functional dependency is a property of the meaning or semantics of the


attributes in a relation.

 The semantics indicate how attributes relate to one another, and specify the
functional dependencies between attributes.

 Determinant refers to the attribute, or group of attributes, on the left-hand


side of the arrow of a functional dependency.

MUHAMMAD HAMIZ MOHD RADZI


MUHAMMAD HAMIZ MOHD RADZI
Tbl_Staff

Tbl_Branch
 Full functional dependency - Indicates that if A and B are attributes of a
relation, B is fully functionally dependent on A if B is functionally dependent
on A, but not on any proper subset of A.
 staffID →branchID

 Here means, branchID is full functional dependent on staffID.

 branchID is functionally dependent on staffID but it is not a subset of


STAFF as it have its own BRANCH table.

MUHAMMAD HAMIZ MOHD RADZI


 A functional dependency A→B is a partially dependency if there is some
attribute that can be removed from A and yet the dependency still holds.

 staffID, sName →branchID

 Previously we have:

 staffID →branchID

 Hence, we know that:

 branchID is functionally dependent on a subset of A (StaffID, Name),


namely StaffID.

MUHAMMAD HAMIZ MOHD RADZI


 Transitive dependency - A condition where A, B, and C are attributes of a
relation such that if A → B and B → C, then C is transitively dependent on
A via B (provided that A is not functionally dependent on B or C).

 staffID →sName, position, salary, branchID, bAddress


 branchID →bAddress

 So, staffID attribute functionally determines bAddress via branchID.

MUHAMMAD HAMIZ MOHD RADZI


• Table format, • 1 NF and no
no repeating partial
group and PK dependencies
identified

1NF 2 NF

BCNF 3 NF
• Every • 2 NF and no
determinant is a transitive
candidate key dependencies
(special case of
3 NF)
THE NEED FOR NORMALIZATION
Example: company that manages building projects
 Charges its clients by billing hours spent on each contract
 Hourly billing rate is dependent on employee’s position
 Periodically, a report is generated that contains information displayed in Table
5.1

MUHAMMAD HAMIZ MOHD RADZI


A SAMPLE REPORT LAYOUT

MUHAMMAD HAMIZ MOHD RADZI


A TABLE IN THE REPORT FORMAT

MUHAMMAD HAMIZ MOHD RADZI


THE NEED FOR NORMALIZATION ( CONTINUED)
Structure of data set in Figure 5.1 does not handle data very well

The table structure appears to work; report is generated with ease

Unfortunately, the report may yield different results, depending on what data
anomaly has occurred

MUHAMMAD HAMIZ MOHD RADZI


CONVERSION TO FIRST NORMAL FORM
Repeating group
 Derives its name from the fact that a group of multiple (related)
entries can exist for any single key attribute occurrence
Relational table must not contain repeating groups
Normalizing the table structure will reduce these data redundancies
Normalization is three-step procedure

MUHAMMAD HAMIZ MOHD RADZI


Step 1:
Eliminate the
Repeating
Groups

1 NF
Step 2:
Step 3:
Identify the
Primary Key Identify All
Dependencies
STEP 1: ELIMINATE THE REPEATING GROUPS
Present data in a tabular format, where each cell has a single value and there
are no repeating groups

Eliminate repeating groups by eliminating nulls, making sure that each


repeating group attribute contains an appropriate data value

MUHAMMAD HAMIZ MOHD RADZI


DATA ORGANIZATION: FIRST NORMAL FORM

MUHAMMAD HAMIZ MOHD RADZI


STEP 2: IDENTIFY THE PRIMARY KEY
Primary key must uniquely identify attribute value

New key must be composed

MUHAMMAD HAMIZ MOHD RADZI


STEP 3: IDENTIFY ALL DEPENDENCIES
Dependencies can be depicted with the help of a diagram
Dependency diagram:
 Depicts all dependencies found within a given table structure
 Helpful in getting bird’s-eye view of all relationships among a table’s
attributes
 Use makes it much less likely that an important dependency will be
overlooked

MUHAMMAD HAMIZ MOHD RADZI


A DEPENDENCY DIAGRAM: FIRST NORMAL FORM (1NF)

MUHAMMAD HAMIZ MOHD RADZI


FIRST NORMAL FORM
Tabular format in which:
 All key attributes are defined
 There are no repeating groups in the table
 All attributes are dependent on primary key
All relational tables satisfy 1NF requirements
Some tables contain partial dependencies
 Dependencies based on only part of the primary key
 Sometimes used for performance reasons, but should be used with caution
 Still subject to data redundancies

MUHAMMAD HAMIZ MOHD RADZI


CONVERSION TO SECOND NORMAL FORM
Relational database design can be improved by converting the database into
second normal form (2NF)

Two steps

Step 2:
Step 1: Write Each
Assign
Corresponding
2 NF Key Component
on a Separate Line
Dependent
Attributes

MUHAMMAD HAMIZ MOHD RADZI


STEP 1: IDENTIFY ALL KEY COMPONENTS
Write each key component on separate line, and then write the original
(composite) key on the last line

Each component will become the key in a new table

MUHAMMAD HAMIZ MOHD RADZI


STEP 2: IDENTIFY THE DEPENDENT ATTRIBUTES
Determine which attributes are dependent on which other attributes

At this point, most anomalies have been eliminated

MUHAMMAD HAMIZ MOHD RADZI


SECOND NORMAL FORM (2NF) CONVERSION RESULTS

MUHAMMAD HAMIZ MOHD RADZI


SECOND NORMAL FORM
Table is in second normal form (2NF) if:

 It is in 1NF and

 It includes no partial dependencies:

 No attribute is dependent on only a portion of the primary key

MUHAMMAD HAMIZ MOHD RADZI


CONVERSION TO THIRD NORMAL FORM
Step 1: Identify Each
New Determinant

3 NF
Step 2: Step 3:
Identify the Dependent Remove the Dependent
Attributes Attributes from
Transitive Dependencies

MUHAMMAD HAMIZ MOHD RADZI


STEP 1: IDENTIFY EACH NEW DETERMINANT
For every transitive dependency, write its determinant as a PK for a new table

 Determinant

 Any attribute whose value determines other values within a row

MUHAMMAD HAMIZ MOHD RADZI


STEP 2: IDENTIFY THE DEPENDENT ATTRIBUTES
Identify the attributes dependent on each determinant identified in Step 1
and identify the dependency

Name the table to reflect its contents and function

MUHAMMAD HAMIZ MOHD RADZI


STEP 3: REMOVE THE DEPENDENT ATTRIBUTES
FROM TRANSITIVE DEPENDENCIES
Eliminate all dependent attributes in transitive relationship(s) from each table
that has such a transitive relationship
Draw a new dependency diagram to show all tables defined in Steps 1–3
Check new tables and modified tables from Step 3 to make sure that each has
a determinant and does not contain inappropriate dependencies

MUHAMMAD HAMIZ MOHD RADZI


THIRD NORMAL FORM (3NF) CONVERSION RESULTS

MUHAMMAD HAMIZ MOHD RADZI


THIRD NORMAL FORM
A table is in third normal form (3NF) if:

 It is in 2NF and

 It contains no transitive dependencies

MUHAMMAD HAMIZ MOHD RADZI


IMPROVING THE DESIGN
Table structures are cleaned up to eliminate the troublesome initial partial and
transitive dependencies

Normalization cannot, by itself, be relied on to make good designs

It is valuable because its use helps eliminate data redundancies

MUHAMMAD HAMIZ MOHD RADZI


IMPROVING THE DESIGN ( CONTINUED)
The following changes were made:
 PK assignment
 Naming conventions
 Attribute atomicity
 Adding attributes
 Adding relationships
 Refining PKs
 Maintaining historical accuracy
 Using derived attributes
MUHAMMAD HAMIZ MOHD RADZI
THE COMPLETED DATABASE

MUHAMMAD HAMIZ MOHD RADZI


THE COMPLETED DATABASE ( CONTINUED)

MUHAMMAD HAMIZ MOHD RADZI


LIMITATIONS ON SYSTEM-ASSIGNED KEYS
System-assigned primary key may not prevent confusing entries

Data entries in Table 5.2 are inappropriate because they duplicate existing
records

 Yet there has been no violation of either entity integrity or referential


integrity

MUHAMMAD HAMIZ MOHD RADZI


DUPLICATE ENTRIES IN THE JOB TABLE

MUHAMMAD HAMIZ MOHD RADZI


NORMALIZATION AND DATABASE DESIGN
Normalization should be part of design process
Make sure that proposed entities meet required normal form before table
structures are created
Many real-world databases have been improperly designed or burdened
with anomalies if improperly modified during course of time
You may be asked to redesign and modify existing databases

MUHAMMAD HAMIZ MOHD RADZI


NORMALIZATION AND DATABASE DESIGN ( CONTINUED)
ER diagram
 Provides the big picture, or macro view, of an organization’s data
requirements and operations
 Created through an iterative process
 Identifying relevant entities, their attributes and their relationship
 Use results to identify additional entities and attributes

MUHAMMAD HAMIZ MOHD RADZI


NORMALIZATION AND DATABASE DESIGN (CONTINUED)
Normalization procedures

 Focus on the characteristics of specific entities

 A micro view of the entities within the ER diagram

Difficult to separate normalization process from ER modeling process

Two techniques should be used concurrently

MUHAMMAD HAMIZ MOHD RADZI


THE INITIAL ERD FOR A CONTRACTING COMPANY

MUHAMMAD HAMIZ MOHD RADZI


THE MODIFIED ERD FOR A CONTRACTING COMPANY

MUHAMMAD HAMIZ MOHD RADZI


THE INCORRECT REPRESENTATION OF A M:N
RELATIONSHIP

MUHAMMAD HAMIZ MOHD RADZI


THE FINAL (IMPLEMENTABLE) ERD FOR A CONTRACTING
COMPANY

MUHAMMAD HAMIZ MOHD RADZI


THE IMPLEMENTED DATABASE FOR THE CONTRACTING
COMPANY

MUHAMMAD HAMIZ MOHD RADZI


BOYCE-CODD NORMAL FORM (BCNF)
Every determinant in the table is a candidate key
 Determinant is an attribute whose value determines other values within a row
 3NF table with one candidate key is already in BCNF

Table is not BCNF if the determinant is not a candidate key.

MUHAMMAD HAMIZ MOHD RADZI


Consider the following relation and determinants.
R(a,b,c,d)
a,c -> b,d
a,d -> b
Here, the first determinant suggests that the primary key of R could be changed from
a,b to a,c. If this change was done all of the non-key attributes present in R could still
be determined, and therefore this change is legal.
However, the second determinant indicates that a,d determines b, but a,d could not
be the key of R as a,d does not determine all of the non key attributes of R (it does
not determine c).
We would say that the first determinate is a candidate key, but the second
determinant is not a candidate key, and thus this relation is not in BCNF (but is in 3rd
normal form).

MUHAMMAD HAMIZ MOHD RADZI


3NF TABLE NOT IN BCNF

MUHAMMAD HAMIZ MOHD RADZI


DECOMPOSITION OF TABLE STRUCTURE TO MEET
BCNF

MUHAMMAD HAMIZ MOHD RADZI


DECOMPOSITION INTO BCNF

MUHAMMAD HAMIZ MOHD RADZI


EXAMPLE OF BCNF
An example table from the University Database might be as follows:

This table is in 3NF already. Transform this into BCNF.

MUHAMMAD HAMIZ MOHD RADZI


If we know the Student Number and Teacher Code we know the Offering (class) the
student is in.
We also know the review date for that student and teacher (Student progress is
reviewed for that class by the teacher and student).
Hence, the dependencies:
S_Num, T_Code  Offering#, Review Date

The table is not in BCNF as if we know the offering number we know who the teacher
is.
Each offering can only have one teacher!

Offering#  T_Code

A non key attribute is a determinant.

MUHAMMAD HAMIZ MOHD RADZI


TRANSFORMING INTO BCNF
Step 1: The determinant, Offering#, becomes part of the key and the dependant
attribute T_Code, becomes a non key attribute.
So the Dependency diagram is now:

S_Num, Offering#  T_Code, Review Date

MUHAMMAD HAMIZ MOHD RADZI


Step 2: There are problems with this structure as T_Code is now dependant on only
part of the key.
This violates the rules for 2NF, so the table needs to be divided with the partial
dependency becoming a new table.
The dependencies would then be:

S_Num, Offering#  T_Code, Review Date


Offering#  T_Code

MUHAMMAD HAMIZ MOHD RADZI


Step 3: The original table is divided into two new tables. Each is in 3NF and in BCNF.
Dependencies: S_Num, Offering#  Review Date
Offering#  T_Code

MUHAMMAD HAMIZ MOHD RADZI


DENORMALIZATION
Creation of normalized relations is important database design goal

Processing requirements should also be a goal


If tables decomposed to conform to normalization requirements

 Number of database tables expands


Joining larger number of tables takes additional disk input/output (I/O) operations
and processing logic
 Reduces system speed
Conflicts among design efficiency, information requirements, and processing speed are
often resolved through compromises that may include denormalization

MUHAMMAD HAMIZ MOHD RADZI


DENORMALIZATION (CONTINUED)
Unnormalized tables in a production database tend to have these defects:

 Data updates are less efficient because programs that read and update tables
must deal with larger tables

 Indexing is much more cumbersome

 Unnormalized tables yield no simple strategies for creating virtual tables known
as views

Use denormalization cautiously

Understand why—under some circumstances—unnormalized tables are a better


choice
MUHAMMAD HAMIZ MOHD RADZI
CASE STUDY 1
Consider this following situation:

MODULE DEPARTMENT LECTURER TEXTS


M1 D1 L1 T1, T2
M2 D1 L1 T1, T3
M3 D1 L2 T4
M4 D2 L3 T1, T5
M5 D2 L4 T6

Normalize this table.

MUHAMMAD HAMIZ MOHD RADZI


CASE STUDY 2
Consider the following FD and
do the normalization.

MUHAMMAD HAMIZ MOHD RADZI


REFERENCES
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel

Database Systems: A Practical Approach to Design, Implementation, and Management,


Thomas Connolly and Carolyn Begg, 5th Edition, 2010, Pearson.

Fundamental of Database Management Systems, Mark L. G., 2nd Edition, 2012, John
Wiley.

MUHAMMAD HAMIZ MOHD RADZI

You might also like