You are on page 1of 31

Topic 7.

Normalization of the relational data


model

HNEU,
Department of Information Systems,
Database Course,
V. V. Fedko
Contents
1. Anomalies when performing operations with
the database
2. Functional dependencies
3. Normal forms and normalization of relations
4. Denormalization of relations

HNEU, Department of Information Systems, Course Database, V. V. Fedko 2


Test questions
En: Ru:

1. Describe the purpose of 1. Опишите назначение


normalization. нормализации.
2. What is the lossless 2. В чем состоит декомпозиция без
decomposition? потерь.
3. Normalize the relation 3. Нормализуйте отношение
Student (#_stud_ticket, Студент (№_студ_билета,
Surname, Phones, Hostel, Фамилия, Телефоны,
Hostel_Address). Общежитие, Адрес_общежития).

HNEU, Department of Information Systems, Course Database, V. V. Fedko 3


1. Anomalies when performing operations with
the database Examples of anomalies
Product_Sale
Sale_date Manufacturer_name Adrress Product_name Price Quantity
01.09.2021 Bakery "Saltovsky" Kharkiv, st. Shironintsev, 1 Bread "Ukrainian" 13.50 200
01.09.2021 Bakery "Saltovsky" Kharkiv, st. Shironintsev, 1 Long loaf "Milk" 12.80 250
01.09.2021 Bakery "Kulinichi" Kharkiv, st. Grishchenko, 17 Bun with poppy seeds 9.00 150
02.09.2021 Bakery "Saltovsky" Kharkiv, st. Shironintsev, 1 Bread "Ukrainian" 13.50 220
02.09.2021 Bakery "Kulinichi" Kharkiv, st. Grishchenko, 17 Long loaf "Milk" 12.80 300
02.09.2021 Bakery “Poltava" Poltava, st. Kamarova, 10-A Bun with poppy seeds 9.00 100

Modification anomalies. When changing the manufacturer's address, appropriate corrections should be
made for all database tuples corresponding to this manufacturer.
Deletion anomalies. When deleting all tuples where there is a delivery from one manufacturer, the address
and name of the manufacturer are lost.
Insertion anomalies. When the contract was signed with the manufacturer, but there was no supply from
him. Then there will be undefined values in the table, since impossible to completely form a tuple.

HNEU, Department of Information Systems, Course Database, V. V. Fedko 4


Causes of anomalies
Data redundancy. Relations have values repeated unnecessarily in one or more tuples
(modification anomalies).
Functional dependency. In one relation there should be attributes with a close logical
connection (deletion and insertion anomalies).

Product_Sale
Sale_date Manufacturer_name Adrress Product_name Price Quantity
01.09.2021 Bakery "Saltovsky" Kharkiv, st. Shironintsev, 1 Bread "Ukrainian" 13.50 200
01.09.2021 Bakery "Saltovsky" Kharkiv, st. Shironintsev, 1 Long loaf "Milk" 12.80 250
01.09.2021 Bakery "Kulinichi" Kharkiv, st. Grishchenko, 17 Bun with poppy seeds 9.00 150
02.09.2021 Bakery "Saltovsky" Kharkiv, st. Shironintsev, 1 Bread "Ukrainian" 13.50 220
02.09.2021 Bakery "Kulinichi" Kharkiv, st. Grishchenko, 17 Long loaf "Milk" 12.80 300
02.09.2021 Bakery “Poltava" Poltava, st. Kamarova, 10-A Bun with poppy seeds 9.00 100

HNEU, Department of Information Systems, Course Database, V. V. Fedko 5


Disadvantages of data redundancy
1. Increases the size of the database unnecessarily.
2. Causes data inconsistency.
3. Decreases efficiency of database.
4. May cause data corruption.

HNEU, Department of Information Systems, Course Database, V. V. Fedko 6


Concept of normalization
Normalization is a database design technique which organizes relations in a manner that
reduces redundancy and dependency of data.
It divides larger relations to smaller relations and links them using relationships.
Sale
Sale_date Manufacturer_name Product_name Quantity
01.09.2021 Bakery "Saltovsky" Bread "Ukrainian" 200
01.09.2021 Bakery "Saltovsky" Long loaf "Milk" 250
01.09.2021 Bakery "Kulinichi" Bun with poppy seeds 150
02.09.2021 Bakery "Saltovsky" Bread "Ukrainian" 220
02.09.2021 Bakery "Kulinichi" Long loaf "Milk" 300
02.09.2021 Bakery “Poltava" Bun with poppy seeds 100

Manufacturer Product
Manufacturer_name Adrress Product_name Price
Bakery "Saltovsky" Kharkiv, st. Shironintsev, 1 Bread "Ukrainian" 13.50
Bakery "Kulinichi" Kharkiv, st. Grishchenko, 17 Long loaf "Milk" 12.80
Bakery “Poltava" Poltava, st. Kamarova, 10-A Bun with poppy seeds 9.00
HNEU, Department of Information Systems, Course Database, V. V. Fedko 7
Normalization mechanism
Normalization of relations consists in decomposing a relation
that is in a previous normal form into two or more relations
satisfying the requirements of the next normal form.

Advantages of normalization
Normalization allows:
1) to be sure that each attribute is defined for its entity;
2) significantly reduce the amount of memory for storing
information;
3) eliminate anomalies in the organization of data storage.

HNEU, Department of Information Systems, Course Database, V. V. Fedko 8


2. Functional dependencies
Definition
Let R be a relation, X and Y be a subset of attributes from R.
Functional dependency. Y is functionally dependent on X if and only if exactly one
value of Y corresponds to each value of X: R.X → R.Y. (X and Y can be composite).
It reads like this:
X functionally determines Y
or Y functionally
X depends on X Y
Y functionally depends on X
X - determinant
Y - dependent

HNEU, Department of Information Systems, Course Database, V. V. Fedko 9


Examples of FD
Product_ Sale (Sale_date, Manufacturer_name, Adrress, Product_name, Price, Quantity)
(Sale_date, Manufacturer_name, Product_name) → (Quantity)
(Manufacturer_name, Product_name) → (Price)
(Manufacturer_name, Product_name) → (Adrress, Price)
(Manufacturer_name, Product_name) ↛ (Quantity)
(Quantity) ↛ (Sale_date, Manufacturer_name, Product_name)

Product_Sale
Sale_date Manufacturer_name Adrress Product_name Price Quantity
01.09.2021 Bakery "Saltovsky" Kharkiv, st. Shironintsev, 1 Bread "Ukrainian" 13.50 200
01.09.2021 Bakery "Saltovsky" Kharkiv, st. Shironintsev, 1 Long loaf "Milk" 12.80 250
01.09.2021 Bakery "Kulinichi" Kharkiv, st. Grishchenko, 17 Bun with poppy seeds 9.00 200
02.09.2021 Bakery "Saltovsky" Kharkiv, st. Shironintsev, 1 Bread "Ukrainian" 13.50 220
02.09.2021 Bakery "Kulinichi" Kharkiv, st. Grishchenko, 17 Long loaf "Milk" 12.80 200
02.09.2021 Bakery “Poltava" Poltava, st. Kamarova, 10-A Bun with poppy seeds 9.00 100

HNEU, Department of Information Systems, Course Database, V. V. Fedko 10


Examples of trivial FD
(Manufacturer_name, Product_name) → (Manufacturer_name)
(Manufacturer_name, Product_name) → (Product_name)

A functional dependency is called trivial if and only if the right (dependent) part of the
symbol for this dependency is a subset of its left part (determinant).

HNEU, Department of Information Systems, Course Database, V. V. Fedko 11


Full functional dependence
A functional dependency R.X → R.Y is called full if no attribute can be removed
from its determinant without breaking this dependency.
(Sale_date, Manufacturer_name, Product_name) → (Quantity)
(Manufacturer_name, Product_name) ↛ (Quantity)

Partial functional dependence


A functional dependency R.X → R.Y is a partial dependency if there is some
attribute that can be removed from R.X and yet the dependency still holds.
(Manufacturer_name, Product_name) → (Address)
(Manufacturer_name) → (Address)

HNEU, Department of Information Systems, Course Database, V. V. Fedko 12


Transitive functional dependency
A functional dependence R.X → R.Y is called transitive if there is such an
attribute Z that there are functional dependencies R.X → R.Z and R.Z → R.Y and
there is no functional dependence R.Z → R.X.
R.X → R.Y: R.X → R.Z → R.Y ; R.Z ↛ R.X.

(SaleId) → (Sale_date, Manufacturer_name, Product_name, Address)

(SaleId) → (Manufacturer_name)
(Manufacturer_name) ↛ (SaleId)
(Manufacturer_name) → (Address)

(SaleId) → (Address)

HNEU, Department of Information Systems, Course Database, V. V. Fedko 13


Mutually independent attributes
Two or more attributes are mutually independent if none of these attributes
functionally depend on the others.
(Sale_date, Manufacturer_name, Product_name)

Multivalued dependency (MVD)


Represents a dependency between attributes (for example, X, Y, and Z) in a
relation R, such that for each value of X there is a set of values for Y and a set of
values for Z. However, the set of values for Y and Z are independent of each
other.
R.X ↠ R.Y
R.X ↠ R.Z
R.Y ↛ R.Z

HNEU, Department of Information Systems, Course Database, V. V. Fedko 14


Example of MVD
Sale_date Manufacturer_name Product_name Sale_date Manufacturer_name
01.09.2021 Bakery "Saltovsky" Bread "Ukrainian" 01.09.2021 Bakery "Saltovsky"
01.09.2021 Bakery "Saltovsky" Long loaf "Milk" 01.09.2021 Bakery "Kulinichi"
01.09.2021 Bakery "Kulinichi" Bread "Ukrainian" 02.09.2021 Bakery "Saltovsky"
02.09.2021 Bakery "Saltovsky" Bread "Ukrainian" 02.09.2021 Bakery "Kulinichi"
02.09.2021 Bakery "Kulinichi" Long loaf "Milk"
02.09.2021 Bakery "Kulinichi" Bread "Ukrainian" Sale_date Product_name
01.09.2021 Bread "Ukrainian"
(Sale_date, Manufacturer_name, Product_name) 01.09.2021 Long loaf "Milk"
02.09.2021 Bread "Ukrainian"
Sale_date ↠ Manufacturer_name 02.09.2021 Long loaf "Milk"
Sale_date ↠ Product_name
Manufacturer_name ↛ Product_name

HNEU, Department of Information Systems, Course Database, V. V. Fedko 15


Decomposition
A decomposition of the relation R is the replacement of R by the set of relations
{R1, R2, ..., Rn}, so that each of them is a projection of R, and each attribute R is
included in at least one of the projections of the decomposition.

Lossless-join decomposition &


Lossy-join decomposition
The decomposition of relation R into R1 and R2 is information lossless when the
natural join of R1 and R2 yield the same relation as in R, otherwise is lossy.
R1 ⋈ R2 = R
The decomposition is a lossless-join decomposition of R if at least one of the
following functional dependencies are
1. R1 ∩ R2 ⟶ R1
2. R1 ∩ R2 ⟶ R2

HNEU, Department of Information Systems, Course Database, V. V. Fedko 16


Example of decomposition1 Manufacturer_Product _Price
Manufacturer_name Product_name Price
INFORMATION LOSS
Bakery "Saltovsky" Bread "Ukrainian" 13.00
Bakery "Saltovsky" Long loaf "Milk" 12.00
Bakery "Kulinichi" Bread "Ukrainian" 13.00

Manufacturer_Price Product _Manufacturer


Manufacturer_name Price Manufacturer_name Product_name
Bakery "Saltovsky"
Bakery "Saltovsky"
13.00
12.00
⋈ Bakery "Saltovsky"
Bakery "Saltovsky"
Bread "Ukrainian"
Long loaf "Milk"
Bakery "Kulinichi" 13.00 Bakery "Kulinichi" Bread "Ukrainian"

Manufacturer_Product _Price
Manufacturer_name Product_name Price
Bakery "Saltovsky" Bread "Ukrainian" 13.00
Bakery "Saltovsky" Long loaf "Milk" 13.00
Bakery "Saltovsky" Bread "Ukrainian" 12.00
Bakery "Saltovsky" Long loaf "Milk" 12.00
Bakery "Kulinichi" Bread "Ukrainian" 13.00

HNEU, Department of Information Systems, Course Database, V. V. Fedko 17


Example of decomposition2 Manufacturer_Product _Price
Manufacturer_name Product_name Price
LOSSLESS-JOIN
Bakery "Saltovsky" Bread "Ukrainian" 13.00
DECOMPOSITION
Bakery "Saltovsky" Long loaf "Milk" 12.00
Bakery "Kulinichi" Bread "Ukrainian" 13.00

Product_Price Product _Manufacturer


Product_name Price Manufacturer_name Product_name
Bread "Ukrainian"
Long loaf "Milk"
13.00
12.00
⋈ Bakery "Saltovsky"
Bakery "Saltovsky"
Bread "Ukrainian"
Long loaf "Milk"
Bread "Ukrainian" 13.00 Bakery "Kulinichi" Bread "Ukrainian"

Manufacturer_Product _Price
Manufacturer_name Product_name Price
Bakery "Saltovsky" Bread "Ukrainian" 13.00
Bakery "Saltovsky" Long loaf "Milk" 12.00
R1 ∩ R2 ⟶ R 1 Bakery "Kulinichi" Bread "Ukrainian" 13.00
{Product_name, Price} ∩ {Product_name, Manufacturer_name} = {Product_name}
{Product_name} → {Product_name, Price}
HNEU, Department of Information Systems, Course Database, V. V. Fedko 18
Keys & dependencies
Product (ProductId, Product_name, Price, Purchase_price):
(ProductId) → (Product_name)
(ProductId) → (Price, Purchase_price)
(ProductId, Product_name) → (Price, Purchase_price)
Conclusion: If the determinant contains a primary key, then the set of all other
attributes of the relation functionally depends on it.

Candidate keys & dependencies


Candidate = Potential
(ProductId) → (Price, Purchase_price)
(Product_name) → (Price, Purchase_price)
Conclusion: The same is true not only for primary keys, but also for alternative
ones, that is, for all candidate keys
HNEU, Department of Information Systems, Course Database, V. V. Fedko 19
Superkey
A superkey is a subset of a relation attributes that satisfies the requirement of
uniqueness.
(ProductId, Product_name) → (Price, Purchase_price)
A superkey differs from a candidate key in that the superkey does not impose the
requirement of minimality or irreducibility (this requirement means that the key does not
contain a smaller subset of attributes that satisfies the condition of uniqueness).
As a result, a different, more “compact” by the number of attributes superkey may be part
of a superkey.
(ProductId) → (Price, Purchase_price)
(Product_name) → (Price, Purchase_price)
___________________________________
(Sale_date, Manufacturer_name, Adrress, Product_name, Price) → Quantity
(Sale_date, Manufacturer_name, Product_name) → Quantity
HNEU, Department of Information Systems, Course Database, V. V. Fedko 20
3. Normal forms and normalization of relations
Normal forms
Normalization is a process for evaluating and correcting table structures to minimize
data redundancies, thereby reducing the likelihood of data anomalies.
Normalization is a formal technique for analysing relations based on their primary key
(or candidate keys) and functional dependencies.
A relation is said to be in some NF if it satisfies a given set of conditions.
Normalization uses a series of tests (described as normal forms) to help identify the
optimal grouping for these attributes to ultimately identify a set of suitable relations
that supports the data requirements of the enterprise.
Normalization is a formal technique that can be used at any stage of database design:

HNEU, Department of Information Systems, Course Database, V. V. Fedko 21


1NF
• Table format,
• PK identified,
• the values in each column of a table are atomic.

Manufacturer Manufacturer
Manufacturer_name Phones Manufacturer_name Phone
Bakery "Kulinichi" 057222222, 066222277 Bakery "Kulinichi" 057222222
Bakery “Poltava" 053666666 Bakery "Kulinichi" 066222277
Bakery “Poltava" 053666666

City in the address

HNEU, Department of Information Systems, Course Database, V. V. Fedko 22


2NF
A relation is in the second normal form if and only if
• it is in the 1NF,
• each non-key attribute is irreducible (functionally fully) depends on the primary key.
Applies to composite key only.
Sale
Sale_date Seller Manufacturer_name Product_name Quantity
01.09.2021 Holmes Bakery "Saltovsky" Bread "Ukrainian" 200
01.09.2021 Holmes Bakery "Saltovsky" Long loaf "Milk" 250
01.09.2021 Holmes Bakery "Kulinichi" Bun with poppy seeds 150
02.09.2021 Watson Bakery "Saltovsky" Bread "Ukrainian" 220
02.09.2021 Watson Bakery "Kulinichi" Long loaf "Milk" 300
02.09.2021 Watson Bakery “Poltava" Bun with poppy seeds 100

(Sale_date, Manufacturer_name, Product_name) → (Quantity, Seller)


(Sale_date, Manufacturer_name, Product_name) → Quantity, Sale_date → Seller
HNEU, Department of Information Systems, Course Database, V. V. Fedko 23
3NF
A relation is in the third normal form if and only if
• it is in the 2NF,
• every non-key attribute is non-transitively dependent on the primary key.

"Nothing but the key“


no non-prime attribute depends on other non-prime attributes.
All the non-prime attributes must depend on the primary key only.

HNEU, Department of Information Systems, Course Database, V. V. Fedko 24


3NF Example
Sale
Sale_date Seller Seller_phone Manufacturer_name Product_name Quantity
01.09.2021 Holmes 09511122233 Bakery "Saltovsky" Bread "Ukrainian" 200
01.09.2021 Holmes 09511122233 Bakery "Saltovsky" Long loaf "Milk" 250
01.09.2021 Holmes 09511122233 Bakery "Kulinichi" Bun with poppy seeds 150
02.09.2021 Watson 06744455566 Bakery "Saltovsky" Bread "Ukrainian" 220
02.09.2021 Watson 06744455566 Bakery "Kulinichi" Long loaf "Milk" 300
02.09.2021 Watson 06744455566 Bakery “Poltava" Bun with poppy seeds 100

1. (Sale_date, Seller, Seller_phone, Manufacturer_name, Product_name) → Quantity

2. (Sale_date, Manufacturer_name, Product_name) → Quantity, Sale_date → (Seller, Seller_phone)

3. (Sale_date, Manufacturer_name, Product_name) → Quantity,


Sale_date → Seller, Seller → Seller_phone
HNEU, Department of Information Systems, Course Database, V. V. Fedko 25
Boyce-Codd normal form (BCNF)

A relation is in the Boyce-Codd Normal Form (BCNF) if and only if every determinant is a
candidate key.
Applies when:
• the relationship may have two or more potential keys;
• potential keys may not be simple, but composite, that is, include several attributes;
• composite potential keys may overlap (have one or more common attributes).
Sometimes BCNF is also called 3.5 normal form.

4NF

A relation is in the Fourth Normal Form (4NF) if and only if for every nontrivial multi-valued
dependency A ↠ B, A is a candidate key of the relation.

5NF
A relation is in the Fifth Normal Form (5NF) if and only if for every join dependency (R1, R2, . . .
Rn) in a relation R, each projection includes a candidate key of the original relation.
HNEU, Department of Information Systems, Course Database, V. V. Fedko 26
4. Denormalization of relations
Purpose of denormalization
Denormalization is a strategy used on a previously-normalized database to increase
performance.
Denormalization is the process of trying to improve the read performance of a
database, at the expense of losing some write performance, by adding redundant
copies of data or by grouping data.
If efficiency is more important than flexibility and volume of the database,
denormalization can be performed - that is, inverse database conversion, in which
linked tables are joined for more efficient access.
Denormalization is performed when database normalization is performed completely
and correctly, and bottlenecks in the schema are identified, which can reduce the
efficiency of query execution.

HNEU, Department of Information Systems, Course Database, V. V. Fedko 27


Star If during queries to the database it is often necessary to join a large number of tables, it is better
to combine several small tables that contain rarely changed information (conditionally constant or
reference data - dimensions) and are closely related to each of them (country - region - city: star).
Invoice
InvoiceId
InvoiceDate
Product InvoiceNumber Manufacturer
ProductId ManufacturerId (FK) ManufacturerId
ProductName ManufacturerName
Price Address
PurchasePrice Phone
ProductGroupId (FK) InvoiceProduct CityId (FK) City
InvoiceProductId CityId

ProductGroup Quantity CityName


InvoiceId (FK) Region RegionId (FK)
ProductGroupId
ProductId (FK) RegionId
ProductGroupName
RegionName

DimDate
DateId
CalendarDate
CalendarYear
MonthNumberOfYear
DayNumberOfMonth

FactReceipts
ManufacturerId (FK) DimManufacturer
DimProduct DateId (FK) ManufacturerId
ProductId (FK)
ProductId ManufacturerName
Quantity CityName
ProductName
Cost RegionName
Price
PurchasePrice
ProductGroupName
HNEU, Department of Information Systems, Course Database, V. V. Fedko 28
Aggregate functions
Queries in which individual indicators are calculated based on information stored in the database, especially
when using groupings and aggregate functions (COUNT, MAX, SUM, etc.) are very often carried out for quite a
long time. As a result, the introduction of an additional column, which would contain the values that would
have been previously calculated, can significantly save time when executing a query, but requires timely
changes in the data in this column (stock balance).

InvoiceProduct
… Product
ProductId
Quantity ProductId
ProductName
Price
Sale PurchasePrice
… StockBalance
ProductId
Quantity

HNEU, Department of Information Systems, Course Database, V. V. Fedko 29


Blob
If the database has tables with a large number of records that contain large fields, such as Blob, then you can
significantly increase the speed of querying such a table by placing such fields in a separate table and
establishing a 1: 1 relationship with it from the initial one. The meaning of this separation is that the
information that is contained in such fields is rarely needed when performing ordinary queries and as a result,
having significantly reduced the size of the initial table, the processing speed can grow quite significantly.

Film Film Movie


FilmId 1 1
FilmId FilmId (FK)
FilmName
Year FilmName Video
Rating Year
Video Rating

HNEU, Department of Information Systems, Course Database, V. V. Fedko 30


Test questions
En: Ru:

1. Describe the purpose of 1. Опишите назначение


normalization. нормализации.
2. What is the lossless-join 2. В чем состоит декомпозиция без
decomposition? потерь.
3. Normalize the relation 3. Нормализуйте отношение
Student (#_stud_ticket, Студент (№_студ_билета,
Surname, Phones, Hostel, Фамилия, Телефоны,
Hostel_Address). Общежитие, Адрес_общежития).

HNEU, Department of Information Systems, Course Database, V. V. Fedko 31

You might also like