You are on page 1of 43

Ch.5.

Data Normalization

Click to edit Master title style


• Edit Master text styles
• Second level
• Third level Chapter 5
• Fourth level
• Fifth level
Data Normalization

4/24/2023 1
2.1
Ch.5. Data Normalization

Click to edit Master title style


Relation
• Edit Master
• A named, text styles
two-dimensional table of data (and hence, often called table)
• •Consists
Second level(records) and columns (attributes or fields)
of rows
• Third level
• Requirements for a table to qualify as a relation:
• Fourth level
• It must have
• aFifth
unique name, representing a unique entity (e.g., customer)
level
• Every attribute value must be atomic (not multivalued, not composite)
• Every row must be unique (no duplicated rows)
• Attributes (columns) in tables must have unique names (no duplicated columns)
• The order of the columns and rows must be irrelevant (i.e., although the order of
rows and/or columns changes, the table should maintain its integrity)

4/24/2023 2
2.2
Ch.5. Data Normalization

Click to edit Master title style


Data Normalization
• Edit Master text styles
• The process to eliminate data redundancy and enhance data
• integrity,
Second while
levelensuring that the database system is easy to navigate
• Third level
• Often creates a table for an associative entity between many-to-many
• Fourth level
relations, which create data redundancy and difficulties in inserting,
deleting, and•updating
Fifth level
data

• Starts with transforming ER diagrams to relations (i.e., tables) and


establish relationships between the relations and then, decomposing
relations with anomalies (i.e., irregularities) to produce smaller, well-
structured relations

4/24/2023 3
2.3
Ch.5. Data Normalization

Click to edit Master title style


• Edit Master text styles
• Second level
• Third level Section. 1
• Fourth level
• Fifth level
Transform Entities to Relations

4/24/2023 4
2.4
Ch.5. Data Normalization

Click to edit Master title style


Transform ER Diagram to Relations

• Edit
• Symbols
Master text styles
and Meanings in ER Diagrams

• Second level
• Third level Entity Name
• Fourth level
Primary Key
• Fifth level
Simple, single-valued attribute

Multi-valued attribute

Composite attribute

4/24/2023 5
2.5
Ch.5. Data Normalization

Click to edit Master title style


Transform ER Diagram to Relations

• Edit Master text styles


• Symbols and Meanings in Relations
• Second level
• Third level
• Fourth level
Primary key
• Fifth level

Foreign key

Regular attribute

4/24/2023 6
2.6
Ch.5. Data Normalization

Click to edit Master title style


Transform ER Diagram to Relations _Simple Attributes

• Edit Master
• Simple text
attributes mapstyles
directly onto the relation
• Second level
CUSTOMER Entity Type
• Third level
• Fourth level
Allowing space in attribute names
• Fifth level

CUSTOMER Relation
NOT Allowing space in attribute names

4/24/2023 7
2.7
Ch.5. Data Normalization

Click to edit Master title style


Transform ER Diagram to Relations _Composite Attributes

•• Edit Master
Composite textto styles
attributes the relations, using only their component attributes
• Second level
• Third level
CUSTOMER Entity Type with Composite Attribute

• Fourth level
• Fifth level

CUSTOMER relation with address detail

4/24/2023 8
2.8
Ch.5. Data Normalization

Click to edit Master title style


Transform ER Diagram to Relations _Multivalued Attributes
• Edit Master text styles
• Multivalued attributes become a separate relation with a foreign key
•taken
Second level
from the superior entity
• Third level
• Fourth level
• Fifth level

EMPLOYEE Entity Type with EMPLOYEE and EMPLOYEE


Multivalued Attribute SKILL Relations

4/24/2023 9
2.9
Ch.5. Data Normalization

Click to edit Master title style


Transform ER Diagram to Relations _Multivalued Attributes
• Edit Master
Employee text styles
Employee Employee
MS Office Skill
• Second ID
level
333-33-3333
First Name
Simpson
Last Name
Alice Word, Excel, PowerPoint
• 111-11-1111
Third level Sanders Ned Word, Excel
• Fourth level
123-45-6789 Moore Tom Excel
• Fifth level Atomic, undividable
attribute

Employee
Employee Employee Employee MS Office Skill
ID
ID First Name Last Name
333-33-3333 Word
333-33-3333 Excel
333-33-3333 Simpson Alice
333-33-3333 PowerPoint
111-11-1111 Sanders Ned 111-11-1111 Word
111-11-1111 Excel
123-45-6789 Moore Tom
123-45-6789 Excel
4/24/2023 10
2.10
Ch.5. Data Normalization

Click to edit Master title style


Transform ER Diagram to Relations _Weak Entities
• Weak entities without a unique identifier become a separate relation
• Edit
withMaster text
a foreign key takenstyles
from the superior entity
• •Primary
Second level
key of a new relation is composed of partial identifiers of weak
entity• and the level
Third foreign key
• Fourth level
• Fifth level
No unique identifier

Weak entity
DEPENDENT

Relations Resulting
from Weak Entity

A composite primary key including


FirstName, MiddleInitial,
LastName, EmployeeID
4/24/2023 11
2.11
Ch.5. Data Normalization

Click to edit Master title style


Transform ER Diagram to Relations _Unary Relationship
• One-to-many in unary relationships become relations with recursive
• Edit Master
foreign text
key in the samestyles
relation
• Second level EMPLOYEE entity with Unary Relationship
• Third level
• Fourth level
• Fifth level

Add a recursive foreign key


taken from the primary key
(i.e., EmployeeID)

EMPLOYEE relation with recursive foreign key


4/24/2023 12
2.12
Ch.5. Data Normalization

Click to edit Master title style


• Edit Master text styles
• Second level
• Third level Section. 2
• Fourth level
• Fifth level
Mapping Relations

4/24/2023 13
2.13
Ch.5. Data Normalization

Click to edit Master title style


Mapping Binary One-to-One Relationships

• •Edit Master
Primary text styles
key on mandatory side becomes a foreign key on optional side
• Second level
• Third level
• Fourth level
• Fifth level One-to-one Relationship between
NURSE and CARE CENTER

Relations between NURSE and CARE


CENTER with Foreign Key

Add a foreign key when it does not


exist in the ER diagram
4/24/2023 14
2.14
Ch.5. Data Normalization

Click to edit Master title style


Mapping Binary One-to-Many Relationships

• •Edit Master
Primary text
key on the one styles
side becomes a foreign key on the many side (c.f.,
similar one-to-one relationships)
• Second level
• Third level
• Fourth level
• Fifth level One-to-Many Relationship between
CUSTOMER and ORDER

Relations between CUSTOMER and


ORDER with Foreign Key

Add a foreign key when it does not


exist in the ER diagram

4/24/2023 15
2.15
Ch.5. Data Normalization

Click to edit Master title style


Mapping Binary Many-to-Many Relationships

•• create
Edit aMaster text
new relation withstyles
the primary keys of the two entities as its composite
primary key
• Second level
• Third level
• Fourth level
• Fifth level

A new relation
(i.e., associative entity)

4/24/2023 16
2.16
Ch.5. Data Normalization

Click to edit Master title style


Mapping Supertype/Subtype Relationships

•• Edit
Primary key of supertype relation becomes primary key of subtype relation
Master text styles
• Second level
• Third level
• Fourth level
• Fifth level

4/24/2023 17
2.17
Ch.5. Data Normalization

Click to edit Master title style


• Edit Master text styles
• Second level
• Third level Section. 3
• Fourth level
• Fifth level
Data Normalization

4/24/2023 18
2.18
Ch.5. Data Normalization

Click to edit Master title style


• Edit
DataMaster
Normalization
text styles
• Second level
• Validates and improve a logical design so that it satisfies certain
• Third
constraints thatlevel
avoid data anomalies and increase data integrity,
• Fourth
which improve level
performance of the query
• Fifth level
• Decomposes relations with anomalies (i.e., redundancies) to smaller,
well-structured relations in first, second, and third normal forms

4/24/2023 19
2.19
Ch.5. Data Normalization

Click to edit Master title style


• Edit Master
Degree text styles Normal Forms
of Normalization:
• Second level
• Third level
• Fourth
• According
level to the degree of normalization, relations are
• Fifth
categorized
level as different levels of normal forms, such as
first (1NF), second(2NF), and third (3NF) normal forms
• The higher degree of normal form tend to have better data
accuracy, integrity and hence, quality
Higher Better!

4/24/2023 20
2.20
Ch.5. Data Normalization

Click
How to edit
to Make Master
First title
Normal Form style
(1NF)
• (1) Having a primary key, which uniquely defines each row, and (2) single
• Edit Master text styles
value at the intersection of each row and column of the table (i.e., no missing
values and multivalued attributes)
• Second level
• Hence, the example below is NOT 1NF (see the next slide for the first normal
form•ofThird
this) level
• Fourth level
• Fifth level

NOT first normal form:


Incomplete primary key

NOT first normal form:


missing values in the relation

4/24/2023 21
2.21
Ch.5. Data Normalization

Click to edit Master title style


How to Make First Normal Form (1NF)_cont’d
• Edit Master text styles
• The table below is in the first normal form with a composite primary key
•(OrderID
Second level and single values in each cell
+ ProductID)
• Third level
• Fourth level
• Fifth level

First normal form: First normal form: WITHOUT


complete primary key missing values in the relation

4/24/2023 22
2.22
Ch.5. Data Normalization

Click to edit Master title style


Problems in First Normal Form (1NF)
• Edit Master
• However, text
to make styles
a first normal form, unnecessary redundancies are
•introduced
Second aslevel
below
• Such•redundancies
Third level are problematic, because they can lead to data anomalies
• Fourth level
• Fifth level

Data redundancies
4/24/2023 23
2.23
Ch.5. Data Normalization

Click to edit Master title style


Data Anomalies
• Edit Master text styles
• Indicate inconsistencies in the data stored in a database as a result of an
•operation
Secondsuch
level
as update, insertion, and/or deletion
• Third
• Primary levelfor inefficient data processing and missing values or
reason
errors in•the data for
Fourth data analysis
level
• Can arise when • aFifth level record is stored in multiple relations but not
particular
all of the copies are equally updated, inserted, or deleted

Inconsistency!

4/24/2023 24
2.24
Ch.5. Data Normalization

Click to edit Master title style


Data Anomalies_ Update Anomalies

•• Edit
UpdateMaster
anomalies:text styles
data inconsistencies caused by partial update of data

• Second level
Alice recently married, changing her last name from
• Third level Simpson to Hopkins. However, the update is only applied
• Fourth level to one of the two instances, causing an update anomaly

Student ID Last• Name


Fifth level
First Name Course No. Section Day Time

333-33-3333 Hopkins Alice ACCT- 3603 1 M 9:00 AM

333-33-3333 Simpson Alice FIN-3213 3 Th 11:00 AM


Not updated
111-11-1111 Sanders Ned ACCT- 3433 2 T 10:00 AM

111-11-1111 Sanders Ned MGMT- 3021 5 W 8:00 AM

123-45-6789 Moore Tom FIN-3213 3 Th 11:00 AM

123-45-6789 Moore Tom ACCT- 3603 1 M 9:00 AM

4/24/2023 25
2.25
Ch.5. Data Normalization

Click to edit Master title style


Data Anomalies_ Insert Anomalies

•• Edit
Insert Master
anomalies: text styles caused by inserting incomplete rows
data inconsistencies

• Second level
Student ID Last Name First Name Course No. Section Day Time
• Third level
333-33-3333 Simpson
• Fourth level Alice ACCT- 3603 1 M 9:00 AM

333-33-3333 • Fifth level


Simpson Alice FIN-3213 3 Th 11:00 AM

111-11-1111 Sanders Ned ACCT- 3433 2 T 10:00 AM

111-11-1111 Sanders Ned MGMT- 3021 5 W 8:00 AM

123-45-6789 Moore Tom

123-45-6789 Moore Tom

Tom was recently added as a new student but has registered no course,
causing an insert anomaly

4/24/2023 26
2.26
Ch.5. Data Normalization

Click to edit Master title style


Data Anomalies_ Deletion Anomalies

•• Edit Master
Deletion text
anomalies: datastyles
inconsistencies caused by partial deletion of data

•Student
Second ID level
Last Name First Name Course No. Section Day Time
• Third level
333-33-3333 Simpson Alice ACCT- 3603 1 M 9:00 AM
• Fourth level
333-33-3333 Simpson
• Fifth levelAlice FIN-3213 3 Th 11:00 AM

111-11-1111 Sanders Ned ACCT- 3433 2 T 10:00 AM

111-11-1111 Sanders Ned MGMT- 3021 5 W 8:00 AM

123-45-6789 Moore Tom FIN-3213 3 Th 11:00 AM

123-45-6789 Moore Tom ACCT- 3603 1 M 9:00 AM

Assuming Tom has decided to withdraw all courses for this


semester and return next semester. In this case, if the course
information are deleted, his course registration data will be
different from others, causing a delete anomaly in the data
4/24/2023 27
2.27
Ch.5. Data Normalization

Click to edit Master title style


Reason for the Issues in First Normal Form (1NF)
• Partial Dependency
• Edit Master text styles
•▪ Second
Indicating alevel
situation that some non-key attributes but not all are functionally
dependent on part of the primary key
• Third level
▪ To change a relation in first normal form to the second, the partial dependency
• Fourth level
should be removed
• Fifth level
Student ID Last Name First Name Course No. Section Day Time
333-33-3333 Simpson Alice ACCT- 3603 1 M 9:00 AM
333-33-3333 Simpson Alice FIN-3213 3 Th 11:00 AM
111-11-1111 Sanders Ned ACCT- 3433 2 T 10:00 AM
111-11-1111 Sanders Ned MGMT- 3021 5 W 8:00 AM
123-45-6789 Moore Tom FIN-3213 3 Th 11:00 AM
123-45-6789 Moore Tom ACCT- 3603 1 M 9:00 AM
Last Name and First Name
depend on Student ID Section, Day, and Time depend on Course No.
4/24/2023 28
2.28
Ch.5. Data Normalization

Click to edit Master title style


How to Make Second Normal Form (2NF)
• Edit Master
• In addition to 1NF,text stylesattribute is fully functionally dependent on
every non-key
the primary key (i.e., NO partial functional dependencies)
• Second level
• Third level
• Fourth level
• Fifth level

OrderDate is uniquely determined by OrderID ProductStandardPrice is uniquely determined by


and has nothing to do with the ProductID ProductID and has nothing to do with the OrdertID

• Two Partial Dependencies in the Example (thus, NOT in 2NF)


▪ OrderID → OrderDate, CustomerID, CustomerName, CustomerAddress
▪ ProductID → ProductDescription, ProductFinish, ProductStandardPrice

4/24/2023 29
2.29
Ch.5. Data Normalization

Click to edit Master title style


How to Make Second Normal Form (2NF)_cont’d
• Why Partial Dependencies Matter?
• Edit Master text styles
• Order
Product
Order
ID
Second
Date ID
level
Customer Customer
Name
Customer
Address
Product
ID
Product
Description
Product
Finish
Standard
Ordered
Quantity
Price
OD1
• Third
20220929 C1
levelNick address_1 P1 GoodProduct GoodFinish 10 5
OD2 20220929 • C2FourthTom
level address_2 P1 GoodProduct GoodFinish 10 10
OD3 20220929 C3 • Fifth
Harry level
address_3 P1 GoodProduct GoodFinish 10 20
OD4 20220929 C4 Andrea address_4 P2 BetterProduct BetterFinish 20 30
OD5 20220930 C1 Nick address_1 P2 BetterProduct BetterFinish 20 10
OD6 20220930 C2 Tom address_2 P2 BetterProduct BetterFinish 20 20
OD7 20220930 C3 Harry address_3 P3 BestProduct BestFinish 30 30
OD8 20220930 C4 Andrea address_4 P3 BestProduct BestFinish 30 10

Partial dependencies introduce data redundancies,


decreasing data processing efficiency

4/24/2023 30
2.30
Ch.5. Data Normalization

Click to edit Master title style


How to Make Second Normal Form (2NF)_cont’d
• Removing Partial Dependencies
• Edit Master text styles
1. Create a new relation for each primary key attribute (or combination of
• attributes),
Second which levelwill be the primary key in the new relation
• Third
2. Move level attributes that are only dependent on this primary key
the non-key
attribute• (or attributes)
Fourth levelfrom the original relation to the new relation(s)
• initial
3. Reorganize the Fifth level
relation with its primary key attribute(s)

INVOICE

PRODUCT

CUSTOMER ORDER

With the two primary key ORDER LINE


attributes in the initial relation
4/24/2023 31
2.31
Ch.5. Data Normalization

Click to edit Master title style


How to Make Second Normal Form (2NF)_cont’d

• Edit Master textORDER


CUSTOMER styles Having transitive dependencies
1
• Second level
NO Partial • ThirdPRODUCT
level
Dependencies 2• Fourth level
(2NF) NO Transitive
• Fifth level Dependencies
ORDR LINE
(3NF)
3

• Transitive dependency: when one non-key attribute in a relation depends on


another non-key attribute to depend on the primary key
• Two Transitive Dependencies in CUSTOMER ORDER (thus, NOT in 3NF)
▪ OrderID → CustomerID → CustomerName
▪ OrderID → CustomerID → CustomerAddress

4/24/2023 32
2.32
Ch.5. Data Normalization

Click to edit Master title style


How to Make Second Normal Form (2NF)_cont’d
• Difference between Partial and Transitive Dependencies
• Edit Master text styles
• Second
▪ Partial level when a relation has a composite primary key, which consists
dependencies:
of multiple primary key attributes, and some attributes depend on one of the
• Third
multiple, level
while the other attributes depend on another primary key attribute
• Fourth level
INVOICE
• Fifth level Two primary key attributes, OrderID and ProductID

▪ Transitive dependencies: when a relation has a single primary key and some
attributes depend on another non-key attribute, which can determine the attributes
CUSTOMER ORDER

One primary key Non-key attribute

4/24/2023 33
2.33
Ch.5. Data Normalization

Click to edit Master title style


How to Make Second Normal Form (2NF)_cont’d

•• Edit
Why Master
Transitivetext styles Matter?
Dependencies
• Second level
CUSTOMER ORDER (2NF)
• Third level
Order Order Customer Customer Customer
ID Date • Fourth
ID level
Name Address
OD1 20220929 •C1 Fifth level
Nick address_1
CUSTOMER ORDER in 2NF still has
OD2 20220929 C2 Tom address_2 redundancies in the relation, caused by
OD3 20220929 C3 Harry address_3 transitive dependencies
OD4 20220929 C4 Andrea address_4
OD5 20220930 C1 Nick address_1
OD6 20220930 C2 Tom address_2
OD7 20220930 C3 Harry address_3
OD8 20220930 C4 Andrea address_4

4/24/2023 34
2.34
Ch.5. Data Normalization

Click to edit Master title style


How to Make Third Normal Form (3NF)

• •Edit Master
In addition text
to 2NF, styles dependencies
no transitive

• Second
• Solution: leveldeterminant (CustomerID in the example) becomes the
non-key
primary key in the new table and stays as foreign key in the old table
• Third level
• Fourth level
• Fifth level

Dividing into two relations, using the non-


key determinant (i.e., CustomerID) as the
primary key in the new table, while staying
as foreign key in the old table

4/24/2023 35
2.35
Ch.5. Data Normalization

Click to edit Master title style


How to Make Third Normal Form (3NF)_cont’d

• Edit Master
CUSTOMER ORDERtext styles
(2NF) ORDER (3NF)
Order
ID
• Order
Second
Date ID
level
Customer Customer
Name
Customer
Address
Order
ID
Order
Date
Customer
ID

OD1 • Third
20220929 C1 level Nick address_1
OD1 20220929 C1
OD2 20220929 C2
OD2 20220929 • C2Fourth Tom
level address_2 OD3 20220929 C3

C3 • Fifth
OD4 20220929 C4
OD3 20220929 Harrylevel
address_3 OD5 20220930 C1
OD4 20220929 C4 Andrea address_4 OD6 20220930 C2
OD7 20220930 C3
OD5 20220930 C1 Nick address_1
OD8 20220930 C4
OD6 20220930 C2 Tom address_2
OD7 20220930 C3 Harry address_3
CUSTOMER (3NF)
OD8 20220930 C4 Andrea address_4 Customer Customer Customer
ID Name Address
C1 Nick address_1
C2 Tom address_2
Data redundancies are removed as dividing C3 Harry address_3
CUSTOMER ORDER into two relations C4 Andrea address_4

4/24/2023 36
2.36
Ch.5. Data Normalization

Click to edit Master title style


Summary of Data Normalization
INVOICE
• Edit Master text styles
• Second level
• Third level
• Fourth level
• Fifth level
PRODUCT

ORDER LINE
Each relation
represents one single Playing as an
“entity”, such as associative relation
product, order line,
order, and customer ORDER

CUSTOMER

4/24/2023 37
2.37
Ch.5. Data Normalization

Click to edit Master title style


• Edit Master text styles
• Second level
• Third level Section. 5
• Fourth level
• Fifth Denormalization
level

4/24/2023 38
2.38
Ch.5. Data Normalization

Click to edit Master title style


To Normalize or Not
• Edit IsMaster text styles
data normalization ALWAYS beneficial for everyone?
• Second level
• Third level
• Fourth level OR
• Fifth level

❖ Non-Technical Stakeholders ❖ Technical Stakeholders


• Business/system analysts • Database analysts/architects
• Project managers • Data administrators
• Accounting mangers • Programmers
• Users • Network/security manager

4/24/2023 39
2.39
Ch.5. Data Normalization

Click to edit Master title style


Denormalization

• •Edit Master
Conducted text styles
when normalized
processing time and cost
relations are not user friendly, requiring excessive data

• Second
• Transforms level relations into non-normalized relations based on needs
normalized
• Third level
• Fourth level
▪ Normalized Relations
• Fifth level

What if database system users


frequently use Description in
ITEM relation, along with
ItemID and Price?
▪ Denormalized Relations

4/24/2023 40
2.40
Ch.5. Data Normalization

Click to edit Master title style


• Benefits
Edit Master text styles
and Costs of Denormalization
• Second level
• Third level
• Benefits:
• Fourth level
▪ Improve performance (speed) by reducing the number
• Fifth level
of join queries
▪ Provide more comprehensive, user friendly data

• Costs (due to data duplication)


▪ Wasted storage space
▪ Data integrity/consistency threats

4/24/2023 41
2.41
Ch.5. Data Normalization

Click to edit Master title style


Denormalization with Cautions
• Edit Master text styles
• Potential problems of denormalization
• Second level
• Increases
• Third chance
level of errors and inconsistencies and reintroduce anomalies
• Forces• reprogramming
Fourth level when business rules change
• Fifth level
• Alternative solutions may exist to improve performance of joins
• Adding data warehouse or OLAP (Online Analytical Processing) or
business intelligence between databases and users
• Clustering relevant tables in a single database

4/24/2023 42
2.42
Ch.5. Data Normalization

Click to edit Master title style


References
• Edit Master text styles
• The major contents of this note are reproduced from the textbook of BCIS 5420;
• Second level
Topi et al. Modern Database Management. 13 th edition. Pearson's, 2019
• Third level
• Unless having a specific reference source, the photos and icons used in this
material are• from
Fourth
the level
following sources providing copyright free images:
• Fifth level
imagesource.com, iconfinder.com, and pexels.com

• The diagrams used are from the textbook publisher’s materials

4/24/2023 43
2.43

You might also like