# Next Page

Up One Level

Lecture Slides available: PDF PowerPoint

Normalisation
Contents
y y y y y

y y y y y y

What is normalisation? Integrity Constraints Understanding Data o Student #2 - Flattened Table First Normal Form Flatten table and Extend Primary Key o Insertion anomaly: o Update anomaly o Deletion anomaly Decomposing the relation Second Normal Form Third Normal Form Summary: 1NF Summary: 2NF Summary: 3NF

What is normalisation?

Normalisation is the process of taking data from a problem and reducing it to a set of relations while ensuring da
y y

Data integrity - all of the data in the database are consistent, and satisfy all integrity constraints. Data redundancy ± if data in the database can be found in two different locations (direct redundancy) or if redundancy.

Data should only be stored once and avoid storing data that can be calculated from other data already held in the of breaking data integrity rules. If redundancy exists in the database then problems can arise when the database is in normal operation:
y y

When data is inserted the data must be duplicated correctly in all places where there is redundancy. For in creating a new employee entry requires that both tables be updated with the employee name. When data is modified in the database, if the data being changed has redundancy, then all versions of the

Hence we have the y referential integrity : This constraint involves the foreign keys. Functional dependancies are rules stating that given a certain set o The definition of a functional dependency looks like A->B. The removal of redundancy helps to prevent insertion. and each one must be uniquely identified. Integrity Constraints An integrity constraint is a rule that restricts the values that may be present in the database. such as 4th and 5th normal forms. deletion. 4. and update errors.K. They are rarely utilised in system design and ar To be in a particular form requires that the data meets the criteria to also be in all normal forms before that form. Basically the normal form 1. and M. We already know what relations are. The relational data m structure to it: y entity integrity : The rows (or tuples) in a relation represent entities.employee name must happen in both tables simultaneously.L. form. so it is vitally important that the in another relation. In this case B is a single attribute but it can be as man (the left hand side of the -> sign) can determine the set of attributes on the right hand side of the -> sign. Foreign keys tie the relations together. Consider this example: . This bas unique and selects a particular set of values for J. Understanding Data Sometimes the starting point for understanding data is given in the form of relations and functional dependancies problem. 1st Normal Form 2nd Normal Form 3rd Normal Form BCNF There are other normal forms. 3. The higher the form the more redundancy has been eliminated. 2. It can also be said that B is functionally dependent versa. since the data is only available The data in the database can be considered to be in one of a number of `normal forms'.

surname. tutor_name) tutor_number -> tutor_name Here there is a relation R. and therefore must Student . There is actually a second functional dependency for this relation. As an example tutor number 1 may be ³Mr Smith´. This is an implied functional dependency and is not normally listed in the list of functional depen Extracting understanding It is possible that the relations and the determinants have not yet been defined for a problem. and therefore is a candidate key in the repeating g . J 14/11/1977 White. tutor_number. date_of_birth. ( subject. which can be worked out from the relation itse attributes in R.an unnormalised tablewith repeating groups matric_no Name 960100 960105 960120 960145 960150 date_of_birth subject grade Smith. D 21/08/1973 Databases C Soft_Dev A ISDE D Soft_Dev B ISDE B Databases A Soft_Dev B Workshop C Databases B Databases Soft_Dev ISDE Workshop B D C D The subject/grade pair is repeated for each student. 960145 has 1 pair while 960150 has four. firstname. given a tutor_number. subject never seems to repeat for a single matric_no.R(matric_no. but tutor number 10 may also be ³Mr Smith´. Looking at the data one can s However. grade ) ) The repeating group needs a key in order that the relation can be correctly defined. Given a possible to work out if we are talking about tutor 1 or tutor 10. Repeating groups a Student(matric_no. it is always possible to work out the tutor_name. name. T 11/03/1970 Smith. and a functional dependency that indicates that: y y y instances of tutor_number are unique in the data from the data. J 09/01/1972 Black. A 10/05/1975 Moore.

so you may produce something like the following relation . ( subject. D 21/08/1973 Black. The name name. D 21/08/1973 Soft_Dev A ISDE ISDE D B Soft_Dev B Databases A Soft_Dev B Workshop C Databases B Databases B Soft_Dev D ISDE C Workshop B The table still shows the same data as the previous example. J 14/11/1977 Smith. name. J 14/11/1977 Databases C 960100 960100 960105 960105 960120 960120 960120 960145 960150 960150 960150 960150 Smith. we can see that the same name appears more than once in the name column. A 10/05/1975 Moore. We have removed the rep Sometimes you will miss spotting the repeating group. the information extracted is only as good as th made or dependencies extracted. but the format is different. date_of_birth. T 11/03/1970 Smith. What is important however is that the information extracted during these exerci Looking at the data itself. A 10/05/1975 White.Whenever keys or dependencies are extracted from example data. T 11/03/1970 Moore. It is also possible that the table Student #2 . J 14/11/1977 White. D 21/08/1973 Black. T 11/03/1970 Moore. grade ) ) As guidance in cases where a variety of keys could be selected one should try to select the relation with the least Flattened Tables Note that the student table shown above explicitly identifies the repeating group.Flattened Table matric_no name date_of_birth Subject grade 960100 Smith. J 09/01/1972 Black. date_of_birth -> matric_no This implies that not only is the matric_no sufficient to uniquely identify a student. D 21/08/1973 Black. the student¶s name combined the relation Student written as: Student(matric_no.

subject. ( subject. and it is called an `unnormalised table'. grade ) ) If the repeating group was flattened. strange anomalies appear in the system. grade ) matric_no -> name. First Normal Form y y y y y First normal form (1NF) deals with the `shape' of the record type A relation is in 1NF if. but as you will see the result of the normalisation proce explicitly does have a repeating group. matric_no can no longer be the primary key . date_of_birth name. Redundant data is the main cause Insertion anomaly: With the primary key including subject. Relational databases require that each row only has a single value per attribute.it does n a compound key since no single attribute can uniquely identify a row. we cannot enter a new student until they have at least one subject to stud . or decompose the relation. date_of_birth. and only if. we still have a problem. and so a repeating group in a row To remove the repeating group. it would look something like: Student(matric_no. date_of_birth.leading to First Normal Form Flatten table and Extend Primary Key The Student table with the repeating group can be written as: Student(matric_no. one of two things can be done: y y either flatten the table and extend the key. as in the Student #2 data table.Student(matric_no. Example: The Student table with the repeating group is not in 1NF It has repeating groups. name. name. subject. but we have created other complications. it contains no repeating attributes or groups of attributes. date_of_birth -> matric_no This data does not explicitly identify the repeating group. date_of_birth. grade ) Although this is an improvement. name. Every repetition of With the relation in its flattened form. The new primary key is a compound key ( We have now solved the repeating groups problem.

A 10/05/1975 Moore. y y This is known as the insertion anomaly.. It is difficult to insert new records into the database. this would require not one Deletion anomaly If all of the records for the `Databases' subject were deleted from the table. On a practical level. one for the repeating groups and one of the no the primary key for the original relation is included in both of the new relations Record matric_no subject 960100 960100 960100 960105 960105 . was changed to Green.subject before we can create a new record.. J..we would inadvertently lose all of the studying only one subject and the subject was deleted. J.J 14/11/1977 White... 960150 Student matric_no name 960100 960105 960120 date_of_birth grade Databases C Soft_Dev A ISDE ISDE . Update anomaly If the name of a student were changed for example Smith. Soft_Dev B Workshop B Smith. D B .T 11/03/1970 .. it also means that it is difficult to keep the data up to date. Again this problem arises from the need to have a compou Decomposing the relation y y The alternative approach is to split the table into two parts.

960145 960150 y y y Smith.it will allow us to find out which subjects a student is studying . grade ) y There are no repeating groups . Student and Record. student 960145 still exists in the Student table. subject. name. It does not always do so.matric_no. and all non-key attributes must depend on the whole key.D 21/08/1973 We now have two relations. name. n (PKDs). The problems arise when there is a compound key. Student contains the original non-repeating groups Record has the original repeating groups and the matric_no Student(matric_no. It cannot be the complete key to the new Record relation . and only if. Consider again the Student relation from the flattened Student #2 table: Student(matric_no. Second Normal Form Second normal form (or 2NF) is a more stringent normal form defined as: A relation is in 2NF if. and will not be entered in the Record table until they are studyin We have also removed the deletion anomaly If all of the `databases' subject records are removed. So in the Record relatio This method has eliminated some of the anomalies. subject.we between the two tables . it depends on the example chosen y y y y y y In this case we no longer have the insertion anomaly It is now possible to enter new students without knowing the subjects that they will be studying They will exist only in the Student table. In thi two key attributes. the key to the Record relation . grade ) Matric_no remains the key to the Student relation. it is in 1NF and every non-key attribute is fully functionally dependent on the Thus the relation is in 1NF with no repeating groups. This is what 2NF tries to prevent. We have also removed the update anomaly Student and Record are now in First Normal Form. e. date_of_birth. date_of_birth ) Record(matric_no. subject.J 09/01/1972 Black.g.

but not grade. we have a compound primary key . keyed on matric_no + subject . separate out all the attributes that are solely dependent on matric_no + subject put them into a separate Student relation.so we must check all of the non-key attributes against each matric_no determines name and date_of_birth. the solution is to split the relation up into its component parts.y y y y y The relation is already in 1NF However. with matric_no as the primary key separate out all the attributes that are solely dependent on subject. separate out all the attributes that are solely dependent on matric_no put them in a new Student_details relation. in this case no attributes are solely dependent on subject. So there is a problem with potential redundancies A dependency diagram is used to show how non-key attributes relate to each part or combination of parts in the p Figure : Depende y y y y y y y y y This relation is not in 2NF It appears to be two tables squashed into one. but not name or date_of_birth. subject together with matric_no determines grade.

Figure : Dependenci Interestingly this is the same set of relations as when we recognized that there were repeating terms in the table a normalizing. and only if. Third Normal Form 3NF is an even stricter normal form and removes virtually all the redundant data : y y A relation is in 3NF if. as the end result should be similar relations. it is in 2NF and there are no transitive functional dependencies Transitive functional dependencies arise: .

However. These relations are now in third normal form. One relation for each part of the key that has partially dependent attributes Summary: 3NF y y y y y A relation is in 3NF if it contains no repeating groups. remove the attributes involved in th Rule: A relation in 2NF with only one non-key attribute must be in 3NF In a normalised relation a non-key field must provide a fact about the key. the whole key and nothing but Relations in 3NF are sufficient for most practical database design problems. no partial functional dependencies.J y y y y 11 New Street Now we need to store the address only once If we need to know a manager's address we can look it up in the Manager relation The manager attribute is the link between the two tables. or Decompose the relation into smaller relations. and in the Projects table it is now a foreign key. Summary: 2NF y y y y y A relation is in 2NF if it contains no repeating groups and no partial key functional dependencies Rule: A relation in 1NF with a single key field must be in 2NF To convert a relation with partial functional dependencies to 2NF. Summary: 1NF y y y y y y A relation is in 1NF if it contains no repeating groups To convert an unnormalised relation to 1NF either: Flatten the table and change the primary key. This option is liable to give the best results. 3NF does not guar Up One Level Next Page . and no transitiv To convert a relation with transitive functional dependencies to 3NF. create a set of new relations: One relation for the attributes that are fully dependent upon the key. one for the repeating groups and one for the non-repeating Remember to put the primary key from the original relation into both new relations.B 32 High Street Smith.Black.

and therefore this change is legal. composite candidate keys with at least one attribute in common. BCNF is based on the concept of a determinant.c -> b.b. Consider the following relation and determinants.d d as a.e. 3NF does not deal satisfactorily with the case of a relation with overlapping candidate keys i.b to a.d -> b Here.d does not determine all of the non key attributes of R (it does not determine c).Previous Page Up One Level Lecture Slides available: PDF PowerPoint Normalisation .BCNF Contents y y y y y y y Boyce-Codd Normal Form (BCNF) Normalisation to BCNF . every determinant is a candidate key. However. the second determinant indicates that a.Example 1 Summary . We would say that the first d .d a. anomalies may result even though the relation is in 3NF. A determinant is any attribute (simple or composite) on which some other attribute is fully functionally dependen A relation is in BCNF is. the first determinant suggests that the primary key of R could be changed from a.c. R(a. and only if.d) a. If this change wa could still be determined.c.Example 1 Example 2 Problems BCNF overcomes Returning to the ER Model Normalisation Example o Library Overview y y normalise a relation to Boyce Codd Normal Form (BCNF) Normalisation example Boyce-Codd Normal Form (BCNF) y y y y y y When a relation has more than one candidate key.

time.doctor) y 1NF Eliminate repeating groups. we could cho DB(Patno. the third their fat is removed by surgery. and so on.doctor) Patno -> PatName Patno. This depicts a special dieting clinic where the each patient has 4 the second they are exercised.PatName.appNo. None: DB(Patno.appNo.Example 1 Patient No Patient Name Appointment Id Time Doctor 1 John 0 09:00 Zorro 2 3 4 5 Kerr Adam Robert Zane 0 1 0 1 09:00 Killer 10:00 Zorro 13:00 Killer 14:00 Zorro Lets consider the database extract shown above. From the information we have. From this (hopefully) make-believe scenario w DB(Patno.PatName. otherwise the is either 09:00 or 13:00.appNo. Normalisation to BCNF .doctor) y 2NF Eliminate partial key dependencies .time.time.appNo.doctor) (example 1b) Example 1a .PatName.appNo.PatName. appointment 2 10:00 or 14:00.time.DB(Patno.appNo -> Time. and on the fourth their mouth is stitched c appointments! If the Patient Name begins with a letter before ³P´ they get a morning appointment.doctor) (example 1a) or DB(Patno.doctor Time -> appNo Now we have to decide what the primary key of DB is going to be. and thus this relation is not in BCNF (but is in 3rd normal form).time.determinant is not a candidate key.PatName.

time.appNo.appNo.doctor) This will not work.doctor) R1(Patno.appNo.appNo IS the key.doctor All LHS present. Is this a candidate key.PatName.doctor) R1(Patno.appNo) time is enough to work out the appointment number of a patient.DB(Patno. Now BCNF is satisfied. None: DB(Patno.DB(Patno.PatName) Go through all determinates where ALL of the left hand attributes are present in a relation and at least ONE of the relation.doctor) y 1NF Eliminate repeating groups.time. as you need both time and Patno together to form a unique key. so relevant. so relevant.PatName) R2(time. so not relevant.time.time. so compliance.PatName. Time -> appNo Time is present.time. Patno. but not PatName.appNo. Thus this determinate is not a We need to fix this.time.appNo) . Patno -> PatName Patno is present in DB. BCNF: rewrite to DB(Patno.appNo -> Time. and time and doctor also present.appNo.time.doctor) R1(Patno.doctor) y 2NF Eliminate partial key dependencies DB(Patno.doctor) R1(Patno.PatName) y 3NF Eliminate transitive dependencies None: so just as 2NF y y y y y y BCNF Every determinant is a candidate key DB(Patno. If it was then we could rewrite DB as: DB(Patno. and the final relations s Example 1b . and so is appNo. Is this a candidate key? Patno.PatName) R2(time.

time.appNo) Go through all determinates where ALL of the left hand attributes are present in a relation and at least ONE of the relation.Ctitle. so relevant. Unfortunately there are no rules as to which route will be the easiest one to take. so not relevant.doctor) R1(Patno.doctor Not all LHS present. Patno -> PatName Patno is present in DB. Time is currently the key for R2. so not relevant.PatName) R2(time.Major -> Grade StudNo.doctor) R1(Patno.appNo) Summary . and so is appNo. Example 2 Grade_report(StudNo.PatName) R2(time.Example 1 This example has demonstrated three things: y y y y BCNF is stronger than 3NF. However.Major -> Advisor Advisor -> Major . (CourseNo.CourseNo.InstrucName InstrucName -> InstrucLocn StudNo. Patno.StudName. This is a candidate key.(Major.Adviser. relations that are in 3NF are not necessarily in BCNF BCNF is needed in certain situations to obtain full understanding of the data model there are several routes to take to arrive at the same set of relations in BCNF. but not PatName. Time -> appNo Time is present.Grade))) y Functional dependencies StudNo -> StudName CourseNo -> Ctitle. so BCNF: as 3NF DB(Patno.InstrucName.InstructLocn.appNo -> Time.y 3NF Eliminate transitive dependencies None: so just as 2NF y y y y y y y y y BCNF Every determinant is a candidate key DB(Patno.time.