You are on page 1of 34

Database Normalization

To reduce redundancy and


improve integrity of databases
Reminder
1. Use this form to confirm your presence at a lecture
2. Test’s results
3. Review of the previous lecture (Relational Algebra)
4. Laboratory work #2

2
Database design approaches
Top-down design Bottom-up design

Entity-Relationship modelling: Normalization:

- Identifying entities, attributes and - Structuring relations to reduce


relationships redundancy
- Mapping diagrams to relations - Using formal rules of decomposition
(tables) (splitting relations into two)

In practice, both approaches are applied

3
Normalization definition
The process of structuring and removing redundant
data of relations in order to improve storage efficiency
and data integrity.

Normalization divides larger tables into smaller tables


and links them using relationships.

It’s applied only for one relation.

4
The reason to use normalization
1. Minimizing data redundancy

Book
Name Author name Author email

Book1 Author1 author1@email.com

Book2 Author1 author1@email.com

There are two book which are written by the same Author1. The author’s email
address is in the relation twice. It’s redundant.

5
The reason to use normalization
2. Eliminating modification anomalies:
- Update anomaly
- Delete anomaly
- Insert anomaly

6
Update anomaly
Book

Name Author name Author email

Book1 Author1 author1@email.com

Book2 Author1 author1@email.com

What if Author1’s email is going to be changed?


Every record of the Book relation must be updated. It is extra work to be done.

7
Delete anomaly
Book

Name Author name Author email

Book1 Author1 author1@email.com

Book2 Author1 author1@email.com

What if Book1 and Book2 are going to be deleted?


Having deleted both records authors information will be lost. It’s a harmful
side-effect.

8
Insert anomaly
Book

Name Author name Author email

Book1 Author1 author1@email.com

Book2 Author1 author1@email.com

Book3 Author1 author1@email.com

What if Book3 of Author1 is going to be inserted?


Having done that 3rd record of Author1’s email will be inserted. It’s unnecessary
(redundant) information.

9
Normal forms
Codd introduced the concept of normalization and what is now known as the first normal form (1NF) in 1970.

Codd went on to define the second normal form (2NF) and third normal form (3NF) in 1971.

Codd and Raymond F. Boyce defined the Boyce-Codd normal form (BCNF) in 1974.

A relational database relation is often described as "normalized" if it meets third normal form.

Most 3NF relations are free of insertion, update, and deletion anomalies.

10
Normal forms
Normalization is a database design technique, which is used to design a relational database table up to higher
normal form (NF).

The process is progressive, and a higher level of database normalization cannot be achieved unless the
previous levels have been satisfied.

1 NF 2 NF 3 NF 3 BCNF 4 NF 5 NF 6 NF

Used in practice

11
Normal forms - Functional dependency
If one set of attributes in a relation (A) determines another set of attributes in the relation, then the second set of
attributes (B) is said to be functionally dependent (FD) on the first set of attributes.

We write it: A→B

where A is determinant

“If you know A you can always say value of B”

Examples:
Student ID → Student Name
City→Country
Username→Profile
Profile→Username
Email→First Name, Last Name, Age

12
Normal forms - Functional dependency
The following is a complete system of rules for functional dependencies:

F1 (Reflexivity): If Y ⊆ X, then X→Y (parts depend on the whole)


If Author, Book → Pages then Author, Book → Book
F2 (Augmentation): If Z ⊆ W and X→Y, then XW→YZ
Pages ⊆ (Tables, Pages) and Book Name → Pictures
then (Book Name, Tables, Pages) → (Pictures, Pages)
F3 (Transitivity): If X→Y and Y→Z, then X→Z.
Other useful rules can be derived from these.
F4 (Pseudotransitivity): If X → Y and YW → Z, then XW→Z.
F5 (Decomposition): If X → Y, then X → A for every A in Y (right part is just one attribute)
If Author, Book → Tables, Pages, Pictures then
Author, Book → Tables Author, Book → Pages Author, Book → Pictures 13
Normal forms - Functional dependency
Because of F5, we assume, that all the FDs have only one attribute at their right side.
The FD X→Y is called non-trivial when Y is not an element of X. Note that a trivial FD is obeyed by all
instances.
A field Y is "functionally dependent" on a field (or fields) X if it is invalid to have two records with the same
X-value but different Y-values.

That is, a given X-value must always occur with the same Y-value.

It is like 1:1 but only from left to right (sometimes the opposite is true)

When X is a key, then all fields are by definition functionally dependent on X in a trivial way, since there
can't be two records having the same X value.

14
Normal forms - Functional dependency
Examples:

Let’s look at a relation Book:

Book (ID, Name, ISBN, Author Name, Author Email, Pages),


Keys:
ID is the primary key(PK), ISBN is the candidate key(CK) (equals to ID), ID+Name is a super key (not minimal)

Functional dependencies:
ID→Name, ISBN, Author Name, Author Email, Pages (By definition of FD and PK + CK)
ISBN→Name, ID, Author Name, Author Email, Pages (By definition of FD and PK + CK)
ISBN→Name
Author Name→Author Email
Author Email→Author Name (it is not always true that A→B and B→A !)
ID→Author Name, Author Email and because of F5 the next two:
ID→Author Name
ID→Author Email
... and others.. 15
Normal forms - Functional dependency
Exercise:

Student Department Subject Mark Date

S1 FPM Math 5 04.02.2020

S1 FPM History 4 02.03.2018

S2 FPM Math 4 01.04.2016

What are the functional dependencies?

16
Normal forms - Functional dependency
Exercise:

Student Department Subject Mark Date

S1 FPM Math 5 04.02.2020

S1 FPM History 4 02.03.2018

S2 FPM Math 4 01.04.2016

What are the functional dependencies?

Student, Subject, Date→Department, Mark (by definition of primary key)


Student→Department
Student, Subject, Date→Department (F5 rule)
Student, Subject, Date→Mark (F5 rule)
Department→Student (why?)
17
1 Normal form (1NF)
To satisfy 1NF, the values in each column of a table must be atomic:

● Each relation “cell” should contain a single value.


● Each record needs to be unique (it’s a property of relation).

Person Friend Person Friend


Ann Bill, Alice Ann Bill
Alex John, Mary Ann Alice
Ann Bill, Alice Alex John

Alex Mary

18
2 Normal form (2NF)
A relation is said to be in 2NF if both the following conditions hold:

● Table is in 1NF (First normal form)


● Every non-key attribute must functionally depend on the whole key, not a part of it

This 2NF only is an issue for relations with a composite primary key. Composite primary key is a primary key
consisting of two or more attributes.

If a relation has primary key consisting of just only one attribute then a relation satisfies 2NF automatically.

19
2 Normal form (2NF). Example
Parts in warehouses

Parts Warehouse Quantity Warehouse City

P1 W1 10 Kyiv

P2 W1 7 Kyiv

P1 W2 44 Lviv
Primary key:
(Parts, Warehouse)
Functional dependencies:
Parts, Warehouse→Quantity, Warehouse City (OK by Primary Key definition)
Parts, Warehouse→Quantity (OK)
Parts, Warehouse→Warehouse City (OK)
Warehouse→Warehouse City (Fails 2NF because Warehouse is just a part of the Primary Key)
Warehouse City→Warehouse (OK) 20
2 Normal form (2NF). Example
Parts in warehouses

Parts Warehouse Quantity Warehouse City

P1 W1 10 Kyiv

P2 W1 7 Kyiv

P1 W2 44 Lviv

QUANTITY is a fact about the combination of PART and WAREHOUSE,


but WAREHOUSE-ADDRESS is a fact about the WAREHOUSE alone (part of primary key)
The basic problems with this design are:
● The warehouse City is repeated in every record that refers to a part stored in that warehouse.
● If the City of the warehouse changes, every record referring to a part stored in that warehouse must be updated.
● Because of the redundancy, the data might become inconsistent, with different records showing different addresses for the same
warehouse.
● If at some point in time there are no parts stored in the warehouse, there may be no record in which to keep the warehouse's
address. 21
2 Normal form (2NF). Example
Foreign key (N:1)
Parts in warehouses Warehouse

Parts Warehouse Quantity Warehouse Warehouse City

P1 W1 10 W1 Kyiv

P2 W1 7 W2 Lviv

P1 W2 44

Functional dependencies: Functional dependencies:


Parts, Warehouse→Quantity Warehouse→Warehouse City
(OK by Primary Key definition) (OK by Primary Key definition)

22
2 Normal form (2NF). Exercise
TV programmes

TV Channel TV Channel Genre Program Program Duration

TV1 Sport Football Match 123

TV1 Sport Skiing 75

TV2 Movie Terminator 6 142

The task:
1) What are functional dependencies?
2) Does the relation satisfy 2NF? Why?

23
2 Normal form (2NF). Exercise
TV programmes

TV Channel TV Channel Genre Program Program Duration

TV1 Sport Football Match 123

TV1 Sport Skiing 75

TV2 Movie Terminator 6 142

Functional dependencies:
TV Channel, Program→TV Channel Genre, Program Duration
Program→Program Duration (2NF violation!)
TV Channel→TV Channel Genre (2NF violation!)

24
2 Normal form (2NF). Exercise
TV Channel/Program
Channel
Functional dependencies:
TV Channel TV Channel TV Channel Program
Only trivial (why?)
Genre
TV1 Football Match
TV1 Sport
TV1 Skiing
TV2 Movie
TV2 Terminator 6
Program
Functional dependencies:
TV Channel→TV Channel Genre Program Program Duration

Football Match 123

Functional dependencies: Skiing 75


Program→Program Duration
Terminator 6 142
25
3 Normal form (3NF)
A relation R is said to be in 3NF if both the following conditions hold:
● Table is in 2NF (Second normal form)
● Every non-prime attribute of R is non-transitively dependent on every candidate key of R.

A non-prime attribute of R is an attribute that does not belong to any candidate key of R

F3 (Transitivity): If X→Y and Y→Z, then X→Z.

All non-prime attribute of R must be functionally dependent on a primary/candidate key i.e. there can be
no interdependencies between non-prime attributes.

26
3 Normal form (3NF). Example
Student University University Location

John KPI Ukraine

Ann Oxford England

Paul Stanford USA

Functional dependencies:
Student→University, University Location
Student→University
Student→University Location
University→University Location

Student→University→University Location (transitive via non-key! 3NF violation) 27


3 Normal form (3NF). Example
Student University

Student University University University Location

John KPI KPI Ukraine

Ann Oxford Oxford England

Paul Stanford Stanford USA

Functional dependencies: Functional dependencies:


Student→University University→University Location

No transitive dependencies!

28
3 Normal form (3NF). Exercise
Books

Book Name Author Author Email Book genre

B1 A1 a1@a1.com Criminal

B2 A1 a1@a1.com Romance

B3 A2 a2@a2.com Technology

Assumption:
One book is written by one author

The task:
1) What are functional dependencies?
2) Does the relation satisfy 3NF? Why?
29
3 Normal form (3NF). Exercise
Books

Book Name Author Author Email Book genre

B1 A1 a1@a1.com Criminal

B2 A1 a1@a1.com Romance

B3 A2 a2@a2.com Technology

Functional dependencies: Transitional functional dependencies (TFD):


Book Name→Author, Author Email, Book genre Book Name→Author→Author Email
Book Name→Book genre Book Name→Author Email→Author
Book Name→Author
Book Name→Author Email Both TFDs violate 3NF!
Author→Author Email
Author Email→Author
30
3 Normal form (3NF). Exercise
Books Authors
Book Name Author Email Book genre Author Author Email

B1 a1@a1.com Criminal A1 a1@a1.com

B2 a1@a1.com Romance A2 a2@a2.com

B3 a2@a2.com Technology
Is it possible to use ‘Author’ as the PK?

Functional dependencies: Functional dependencies:


Book Name→Author Email, Book genre Author Email→Author

No transitive dependencies!

31
3 Normal form (3NF). Exercise
User

ID Username Email

1 a1 a1@a1.com

2 a2 a2@a2.com

Does the table satisfies 3NF?

Super, Candidate, Primary Keys:


???
Functional dependencies:
???

32
3 Normal form (3NF). Exercise
User

ID Username Email Username is unique - candidate key!


Email is unique - candidate key!
1 a1 a1@a1.com Keys?

2 a2 a2@a2.com

Functional dependencies:
ID→Username, Email
ID→Username
ID→Email
Username→Email
Email→Username
ID→Username→Email (transitive but not non-key! No problem)
ID→Email→Username (transitive but not non-key! No problem)
33
Homework
Student Faculty Dean Student Dean Age Student
Age Photo

Change the design of the relation performing


normalization:

1. Set the primary key, find the candidate keys


2. Find functional dependencies
3. Ensure third normal form is satisfied

34

You might also like