Professional Documents
Culture Documents
The problems associated with using a relation that is not appropriately normalized are known as anomalies. Anomalies
can potentially occur during changes to a database. An anomaly is a bad thing because data can become logically
corrupted.
Problems without Normalization
Without Normalization, it becomes difficult to handle and update the database, without facing data loss.
Insertion, Updation and Deletion anomalies are very frequent if database is not normalized. To understand
these anomalies let us take an example of Student table.
S_id S_Name S_Address Subject_opted
401 Mulenga 22 Masuku Rd Biology
402 Zulu 4 Orange close Maths
403 Sinks 10 Mango St Maths
401 Mulenga 22 Masuku Rd Physics
Updation Anamoly : To update address of a student who occurs twice or more than twice in a table, we will
have to update S_Address column in all the rows, else data will become inconsistent.
Insertion Anamoly : Suppose for a new admission, we have a Student id(S_id), name and address of a
student but if student has not opted for any subjects yet then we have to insert NULL there, leading to
Insertion Anamoly.
Deletion Anamoly : If (S_id) 401 has only one subject and temporarily he drops it, when we delete that row,
entire student record will be deleted along with it.
Functional dependency
An important concept associated with normalization is functional dependency, which describes the relationship
between attributes.
Y is functionally dependent on X if the value of Y is determined by X. In other words, if Y = X +1, the value of X will
determine the resultant value of Y. Thus, Y is dependent on X as a function of the value of X. (X + 1 Y)
For example, if A and B are attributes of relation R, B is functionally dependent on A (A B), if each value of A is
associated with exactly one value of B. If we know the value of A and we examine the relation that holds this
dependency, we find only one value for B in all the tuples that have a given value of A, at any moment in time. If two
tuples have the same value of A, they also have the same value of B. However, for a give value of B, there may be
several different values of A.
B is functionally
A B
Dependent on A
Dependency, Determinants
Determinant—the attribute or group of attributes on the left hand side of the arrow of a functional dependency. A is a
determinant of B.
1
StaffNo functionally
StaffNo Position
determines position
Staff No ZT05
Staff No ZT45
Staff No ZT30
(b) Position does not functionally determine staffNo (Position staffNo)
Full functional dependency—this situation occurs where A determines B, but A combined with C does not determine
B. In other words, B depends on A and A alone. If B depends on A with anything else, there is not full functional
dependence. Essentially A, the determinant, cannot be a composite key. A composite key contains more than one field
(the equivalent of A with C).
Partial functional dependency – If there is some attribute that can be removed from A and yet the dependency holds.
StaffNo, sName branchNo (This is a partial dependency because branchNo is also functional dependent
on a subset of (staffNo, sName) ie staffNo
fd4
fd5
fd6
Student
SNo SName CNo CName Addr Instr. Office
5425 Susan Ross 102 Calc I …San Jose, CA P. Smith B42 Room 112
7845 DaveTurco 541 Bio 10 ...San Diego, CA L. Talip B24 Room 210
SNo -> SName CNo -> CName Instr -> Office CNo -> Instr SNo -> Addr
Axioms
1. Reflexivity Rule --- If X is a set of attributes and Y is a subset of X, then X Y holds.
each subset of X is functionally dependent on X.
2. Augmentation Rule --- If X Y holds and W is a set of attributes, then WX WY holds.
3. Transitivity Rule --- If X Y and Y Z holds, then X Z holds.
4. Union Rule --- If X Y and X Z holds, then X YZ holds.
5. Decomposition Rule --- If X YZ holds, then so do X Y and X Z.
6. Pseudotransitivity Rule --- If X Y and WY Z hold then so does WX Z.
Normalization
The term normalization means to make normal in terms of causing something to conform to a standard, or to introduce
consistency with respect to style and content. In terms of relational database modeling, that consistency becomes a
process of removing duplication in data, among other factors. Removal of duplication tends to minimize redundancy.
Minimization of redundancy implies getting rid of unneeded data present in particular places, or tables. The goal of
3
normalization is to reduce problems with data consistency by reducing redundancy. That is, to identify a suitable set of
relations that support the data requirements of an enterprise.
The characteristics of a suitable set of relations include:
The minimum of attributes necessary to support the data requirements of the enterprise;
Attributes with close logical relationship (functional dependency) are found in the same relation;
Minimum redundancy, with each attribute represented only once.
The sequence of steps involved in the normalization process is called Normal Forms.
Normalization is an incremental process i.e. each Normal Form layer adds to whatever Normal Forms have already
been applied. These steps are the 1st, 2nd, and 3rd Normal Forms.
Benefits of Normalization
Effectively minimizing redundancy is another way of describing removal of duplication. The effect of removing
duplication is as follows:
Physical space needed to store data is reduced thus minimizing costs;
Data becomes better organized, hence updates to the data stored in the database are achieved with a
minimum number of operations;
The database will be easier for the user to access and maintain the data.
PG16 5 Novar Dr, Glasgow 1 Sep 08 1 Sep 09 450 CO93 Tony Shaw
CR56 Aline Stewart PG4 6 Lawrence St, Glasgow 1 Sep 06 10 Jun 07 350 CO40 Tina Murphy
PG36 2 Manor Rd, Glasgow 10 Oct 07 1 Dec 08 375 CO93 Tony Shaw
PG16 5 Novar Dr, Glasgow 1 Nov 09 10 Aug 10 450 CO93 Tony Shaw
4
Repeating group = (propertyNo, pAddress, rentStart, rentFinish, rent, ownerNo, oName).
The problem with putting data in tables with repeating groups is that the table cannot be easily indexed or arranged so
that the information in the repeating group can be found without searching each record individually.
The solution is to eliminate repeating groups such that all records in all tables can be identified uniquely. The table is
decomposed into 1NF table with no repeating groups:
ClientRental
clienNo cName propertyNo pAddress rentStart rentFinish rent ownerNo oName
CR76 John Kay PG4 6 Lawrence St, Glasgow 01-Jul-07 31-Aug-08 350 CO40 Tina Murphy
CR76 John Kay PG16 5 Novar Dr, Glasgow 01-Sep-08 01-Sep-09 450 CO93 Tony Shaw
CR56 Aline Stewart PG4 6 Lawrence St, Glasgow 01-Sep-06 10-Jun-07 350 CO40 Tina Murphy
CR56 Aline Stewart PG36 2 Manor Rd, Glasgow 10-Oct-07 01-Dec-08 375 CO93 Tony Shaw
CR56 Aline Stewart PG16 5 Novar Dr, Glasgow 01-Nov-09 10-Aug-10 450 CO93 Tony Shaw
fd4
fd5
fd6
Using the functional dependencies above, we identify the presence of any partial dependencies on the primary key (in
fd1 cName only on clientNo and in fd2 pAddress, rent, ownerNo, oName only on propertyNo). The attributes rentStart
and rentFinish are fully dependent on the whole primary key that is the clientNo and propertyNo attributes. These
results in the creation of three new relations called Client Rental and PropertyOwner. These three relations are in
second normal form.
Rental
Client
clientNo propertyNo rentStart rentFinish
clientNo cName
CR76 PG4 01-Jul-07 31-Aug-08
CR76 John Kay CR76 PG16 01-Sep-08 01-Sep-09
CR56 Aline Stewart CR56 PG4 01-Sep-06 10-Jun-07
CR56 PG36 10-Oct-07 01-Dec-08 5
CR56 PG16 01-Nov-09 10-Aug-10
PropertyOwner
propertyNo pAddress rent ownerNo oName
PG4 6 Lawrence St, Glasgow 350 CO40 Tina Murphy
PG16 5 Novar Dr, Glasgow 450 CO93 Tony Shaw
PG36 2 Manor Rd, Glasgow 375 CO93 Tony Shaw
EXERCISE:
1. In relational database development, what is the process of normalization intended to achieve?
2. Normalize the following customer record
Customer Record
Customer No
Customer Firstname
Customer Surname
Address
Tel No
Supplier No
Supplier name
Supplier address
Stock No
Stock item
Stock cost
Description
Supplier Tel No
3. Design a set of three relations to represent this data that conform to First Normal Form (1NF), selecting
keys as necessary.
Emp_Proj
EmpNo ProjNo Hours EmpName ProjName ProjLocation
6
4. Members of a sports club can take up to two activities with a personal trainer, for which they
have to pay a fee depending on the activity, as shown below: