You are on page 1of 10

8.

Being Normal: Normalization and Other Basic Design Issues


Functional Dependency
Functional dependency (FD) is a set of constraints between two attributes in a relation. Functional
dependency says that if two tuples have same values for attributes A1, A2, ..., An, then those two tuples must have
to have same values for attributes B1, B2, ..., Bn.
Functional dependency is represented by an arrow sign (→) that is, X→Y, where X functionally determines Y.
The left‐hand side attributes determine the values of attributes on the right‐hand side.

Trivial Functional Dependency


 Trivial − If a func onal dependency (FD) X → Y holds, where Y is a subset of X, then it is called a trivial FD. Trivial FDs always
hold.
 Non‐trivial − If an FD X → Y holds, where Y is not a subset of X, then it is called a non‐trivial FD.
 Completely non‐trivial − If an FD X → Y holds, where x intersect Y = Φ, it is said to be a completely non‐trivial FD.

Database normalization, or simply normalization, is the process of organizing the columns (attributes) and tables
(relations) of a relational database to reduce data redundancy and improve data integrity.

Normalization is a process of organizing the data in database to avoid data redundancy,


insertion anomaly, update anomaly & deletion anomaly.

Normalization
If a database design is not perfect, it may contain anomalies, which are like a bad dream for any
database administrator. Managing a database with anomalies is next to impossible.
 Update anomalies − If data items are sca ered and are not linked to each other properly,
then it could lead to strange situations. For example, when we try to update one data item
having its copies scattered over several places, a few instances get updated properly while a
few others are left with old values. Such instances leave the database in an inconsistent state.
 Deletion anomalies − We tried to delete a record, but parts of it was le undeleted because
of unawareness, the data is also saved somewhere else.
 Insert anomalies − We tried to insert data in a record that does not exist at all.
Normalization is a method to remove all these anomalies and bring the database to a consistent
state.

First Normal Form


First Normal Form is defined in the definition of relations (tables) itself. This rule defines that
all the attributes in a relation must have atomic domains. The values in an atomic domain are
indivisible units.

We re‐arrange the relation (table) as below, to convert it to First Normal Form.


Each attribute must contain only a single value from its pre‐defined domain.

Second Normal Form


Before we learn about the second normal form, we need to understand the following −
 Prime attribute − An a ribute, which is a part of the prime‐key, is known as a prime attribute.
 Non‐prime attribute − An a ribute, which is not a part of the prime‐key, is said to be a non‐
prime attribute.
If we follow second normal form, then every non‐prime attribute should be fully functionally
dependent on prime key attribute. That is, if X → A holds, then there should not be any proper
subset Y of X, for which Y → A also holds true.

We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID.
According to the rule, non‐key attributes, i.e. Stu_Name and Proj_Name must be dependent upon
both and not on any of the prime key attribute individually. But we find that Stu_Name can be
identified by Stu_ID and Proj_Name can be identified by Proj_ID independently. This is called partial
dependency, which is not allowed in Second Normal Form.

We broke the relation in two as depicted in the above picture. So there exists no partial
dependency.

Third Normal Form


For a relation to be in Third Normal Form, it must be in Second Normal form and the following
must satisfy −
 No non‐prime attribute is transitively dependent on prime key attribute.
 For any non‐trivial functional dependency, X → A, then either −
o X is a superkey or,
o A is prime attribute.

We find that in the above Student_detail relation, Stu_ID is the key and only prime key
attribute. We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip is a superkey
nor is City a prime attribute. Additionally, Stu_ID → Zip → City, so there exists transitive
dependency.
To bring this relation into third normal form, we break the relation into two relations as follows –

Boyce‐Codd Normal Form


Boyce‐Codd Normal Form (BCNF) is an extension of Third Normal Form on strict terms. BCNF
states that −
 For any non‐trivial functional dependency, X → A, X must be a super‐key.
In the above image, Stu_ID is the super‐key in the relation Student_Detail and Zip is the super‐key
in the relation ZipCodes. So,
Stu_ID → Stu_Name, Zip and Zip → City
Which confirms that both the relations are in BCNF.

What is a dependency in a database?


In relational database theory, a functional dependency is a constraint between two sets of
attributes in a relation from a database. In other words, functional dependency is a constraint that
describes the relationship between attributes in a relation.

Partial dependency means that a nonprime attribute is functionally dependent on part of a


candidate key. (A nonprime attribute is an attribute that's not part of any candidate key.)

Partial dependency is a functional dependency that refers to the phenomenon where a


primary key determines the outcome of another attribute or set of attributes. It occurs when a non‐
key attribute of a table in a database is dependent on the value of only a part of the table’s primary
key but not the entire primary key.
What is transitive functional dependency?
In Database Management System, a transitive dependency is a functional dependency which holds
by virtue of transitivity. A transitive dependency can occur only in a relation that has three or more
attributes. Let A, B, and C designate three distinct attributes (or distinct collections of attributes) in
the relation.

The difference between partial and transitive dependency


A partial dependency is a dependency where A is functionally dependent on B ( A  B ), but
there is some attribute on A that can be removed from A and yet the dependency if the stills holds.
For instance if the relation existed StaffNo, sName  branchNo Then you could say that for every
StaffNo, sName there is only one value of branchNo, but since there is no relation between branchNo
and staffNo the relation is only partial.
In a transitive dependency is where A  B and B  C, therefore A C ( provided that B  A,
and C  A doesn’t exist ). In the relation staffNo  sName, position, salary, branchNo, bAddress
branchNo  bAddress is a transitive dependency because it exists on StaffNo via branchNo. That is
the difference.

What is the primary difference between a partial dependency and a transitive dependency?
A. A partial dependency only reveals part of a functional dependency and a transitive dependency
reveals the entire functional dependency B. A partial dependency’s left‐hand side of the equation is
part of a key, and a transitive dependency’s left‐hand side of the equation is not part of a key. C. A
transitive dependency has multiple attributes on the left‐hand side of the equation, and a partial
dependency only has one. D. There is no difference.

A dependency occurs in a database when information stored in the same database table uniquely
determines other information stored in the same table.

For example, In a table listing employee characteristics including Social Security Number (SSN) and
name, it can be said that name is dependent upon SSN (or SSN ‐> name) because an employee's
name can be uniquely determined from their SSN.
However, the reverse statement (name ‐> SSN) is not true because more than one employee can
have the same name but different SSNs.

Trivial Functional Dependencies


A trivial functional dependency occurs when you describe a functional dependency of an
attribute on a collection of attributes that includes the original attribute. For example, “{A, B} ‐> B”
is a trivial functional dependency, as is “{name, SSN} ‐> SSN”. This type of functional dependency is
called trivial because it can be derived from common sense. It is obvious that if you already know the
value of B, then the value of B can be uniquely determined by that knowledge.

Transitive Dependencies
Transitive dependencies occur when there is an indirect relationship that causes a functional
dependency. For example, ”A ‐> C” is a transitive dependency when it is true only because both “A ‐
> B” and “B ‐> C” are true.
Importance of Dependencies
Database dependencies are important to understand because they provide the basic building blocks
used in database normalization. For example:
 For a table to be in second normal form (2NF), there must be no case of a non‐prime attribute in the
table that is functionally dependent upon a subset of a candidate key.
 For a table to be in third normal form (3NF), every non‐prime attribute must have a non‐transitive
functional dependency on every candidate key.
 For a table to be in Boyce‐Codd Normal Form (BCNF), every functional dependency (other than trivial
dependencies) must be on a superkey.
 For a table to be in fourth normal form (4NF), it must have no multivalued dependencies.

Online Transaction Processing (OLTP)


OLTP improves throughput and reduces latency for transaction processing, and can help improve performance
of transient data scenarios such as temp tables and ETL. In‐Memory OLTP is a memory‐optimized database engine
integrated into the SQL Server engine, optimized for transaction processing.

Operational, or online transaction processing (OLTP)

ETL is short for extract, transform, load, three database functions that are combined into one tool to pull data out
of one database and place it into another database.
 Extract is the process of reading data from a database.
 Transform is the process of converting the extracted data from its previous form into the form it needs to be in so
that it can be placed into another database. Transformation occurs by using rules or lookup tables or by combining
the data with other data.
 Load is the process of writing the data into the target database.
ETL is used to migrate data from one database to another, to form data marts and data warehouses and also to
convert databases from one format or type to another.

OLAP is an acronym for Online Analytical Processing. OLAP performs multidimensional analysis of business data and provides
the capability for complex calculations, trend analysis, and sophisticated data modeling.

Data Warehouse
A data warehouse is a:
subject‐oriented
integrated
timevarying
non‐volatilecollection of data in support of the management's decision‐making process.
A data warehouse is a centralized repository that stores data from multiple information sources and transforms them into
a common, multidimensional data model for efficient querying and analysis.

A relationship works by matching data in key columns — usually columns with the same name in both tables. In most
cases, the relationship matches the primary key from one table, which provides a unique identifier for each row, with
an entry in the foreign key in the other table.

There are several types of database relationships. Today we are going to cover the following:
 One to One Relationships
 One to Many and Many to One Relationships
 Many to Many Relationships
 Self‐Referencing Relationships
When selecting data from multiple tables with relationships, we will be using the JOIN query. There
are several types of JOIN's, and we are going to learn about the the following:
 Cross Joins
 Natural Joins
 Inner Joins
 Left (Outer) Joins
 Right (Outer) Joins
We will also learn about the ON clause and the USING clause.
One to One Relationships
Let's say you have a table for customers:
We can put the customer address information on a separate table:

Now we have a relationship between the Customers table and the Addresses table. If each address
can belong to only one customer, this relationship is "One to One". Keep in mind that this kind of
relationship is not very common. Our initial table that included the address along with the customer
could have worked fine in most cases.
Notice that now there is a field named "address_id" in the Customers table, that refers to the
matching record in the Address table. This is called a "Foreign Key" and it is used for all kinds of
database relationships. We will cover this subject later in the article.
We can visualize the relationship between the customer and address records like this:

Note that the existence of a relationship can be optional, like having a customer record that has no
related address record.

One to Many and Many to One Relationships


This is the most commonly used type of relationship. Consider an e‐commerce website, with the
following:
 Customers can make many orders.
 Orders can contain many items.
 Items can have descriptions in many languages.
In these cases we would need to create "One to Many" relationships. Here is an example:
Each customer may have zero, one or multiple orders. But an order can belong to only one customer.

Many to Many Relationships


In some cases, you may need multiple instances on both sides of the relationship. For example, each
order can contain multiple items. And each item can also be in multiple orders.
For these relationships, we need to create an extra table:

The Items_Orders table has only one purpose, and that is to create a "Many to Many" relationship
between the items and the orders.
Here is a how we can visualize this kind of relationship:
If you want to include the items_orders records in the graph, it may look like this:

Self‐Referencing Relationships
This is used when a table needs to have a relationship with itself. For example, let's say you have a
referral program. Customers can refer other customers to your shopping website. The table may look
like this:

Customers 102 and 103 were referred by the customer 101.


This actually can also be similar to "one to many" relationship since one customer can refer multiple
customers. Also it can be visualized like a tree structure:

One customer might refer zero, one or multiple customers. Each customer can be referred by only one
customer, or none at all.
If you would like to create a self‐referencing "many to many" relationship, you would need an extra
table like just like we talked about in the last section.
DIAGRAMMING DATABASES
Entity‐relationship diagrams (ERDs) are an important tool in good database design.
What is a ERD?
An entity‐relationship diagram (ERD) is a data modeling technique that graphically illustrates an information system’s
entities and the relationships between those entities. An ERD is a conceptual and representational model of data used to represent
the entity framework infrastructure.
The elements of an ERD are:
 Entities
 Relationships
 Attributes
Steps involved in creating an ERD include:
1. Identifying and defining the entities
2. Determining all interactions between the entities
3. Analyzing the nature of interactions/determining the cardinality of the relationships
4. Creating the ERD

DENORMALIZATION

Denormalization is the process of trying to improve the read performance of a database, at the expense of
losing some write performance, by adding redundant copies of data or by grouping data.

What is denormalized data?


Denormalization is the process of attempting to optimize the read performance of a database by adding
redundant data or by grouping data. In some cases, denormalization is a means of addressing performance or
scalability in relational database software.

What does Denormalization mean?


Denormalization is a strategy that database managers use to increase the performance of a database infrastructure. It involves
adding redundant data to a normalized database to reduce certain types of problems with database queries that combine data from
various tables into a single table. The definition of denormalization is dependent on the definition of normalization, which is defined
as the process of organizing a database into tables correctly to promote a given use.

Write down the difference between Database normalization and Denormalization.


Normalization:
Normalization is the process of organizing the fields and tables of a relational database to minimize
redundancy. Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining
relationships between them. The objective is to isolate data so that additions, deletions, and modifications of a field
can be made in just one table and then propagated through the rest of the database using the defined relationships.
Denormalization:
Denormalization is the process of attempting to optimize the read performance of a database by adding
redundant data or by grouping data. In some cases, denormalization is a means of addressing performance or
scalability in relational database software.

What is the difference between Normalization and Denormalization?


‐ Normalization and denormalization are two processes that are completely opposite.
‐ Normalization is the process of dividing larger tables in to smaller ones reducing the redundant data, while
denormalization is the process of adding redundant data to optimize performance.
‐ Normalization is carried out to prevent databases anomalies.
‐ Denormalization is usually carried out to improve the read performance of the database, but due to the additional
constraints used for denormalization, writes (i.e. insert, update and delete operations) can become slower.
Therefore, a denormalized database can offer worse write performance than a normalized database.
‐ It is often recommended that you should “normalize until it hurts, denormalize until it works”.

You might also like