Professional Documents
Culture Documents
Database normalization, or simply normalization, is the process of organizing the columns (attributes) and tables
(relations) of a relational database to reduce data redundancy and improve data integrity.
Normalization
If a database design is not perfect, it may contain anomalies, which are like a bad dream for any
database administrator. Managing a database with anomalies is next to impossible.
Update anomalies − If data items are sca ered and are not linked to each other properly,
then it could lead to strange situations. For example, when we try to update one data item
having its copies scattered over several places, a few instances get updated properly while a
few others are left with old values. Such instances leave the database in an inconsistent state.
Deletion anomalies − We tried to delete a record, but parts of it was le undeleted because
of unawareness, the data is also saved somewhere else.
Insert anomalies − We tried to insert data in a record that does not exist at all.
Normalization is a method to remove all these anomalies and bring the database to a consistent
state.
We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID.
According to the rule, non‐key attributes, i.e. Stu_Name and Proj_Name must be dependent upon
both and not on any of the prime key attribute individually. But we find that Stu_Name can be
identified by Stu_ID and Proj_Name can be identified by Proj_ID independently. This is called partial
dependency, which is not allowed in Second Normal Form.
We broke the relation in two as depicted in the above picture. So there exists no partial
dependency.
We find that in the above Student_detail relation, Stu_ID is the key and only prime key
attribute. We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip is a superkey
nor is City a prime attribute. Additionally, Stu_ID → Zip → City, so there exists transitive
dependency.
To bring this relation into third normal form, we break the relation into two relations as follows –
What is the primary difference between a partial dependency and a transitive dependency?
A. A partial dependency only reveals part of a functional dependency and a transitive dependency
reveals the entire functional dependency B. A partial dependency’s left‐hand side of the equation is
part of a key, and a transitive dependency’s left‐hand side of the equation is not part of a key. C. A
transitive dependency has multiple attributes on the left‐hand side of the equation, and a partial
dependency only has one. D. There is no difference.
A dependency occurs in a database when information stored in the same database table uniquely
determines other information stored in the same table.
For example, In a table listing employee characteristics including Social Security Number (SSN) and
name, it can be said that name is dependent upon SSN (or SSN ‐> name) because an employee's
name can be uniquely determined from their SSN.
However, the reverse statement (name ‐> SSN) is not true because more than one employee can
have the same name but different SSNs.
Transitive Dependencies
Transitive dependencies occur when there is an indirect relationship that causes a functional
dependency. For example, ”A ‐> C” is a transitive dependency when it is true only because both “A ‐
> B” and “B ‐> C” are true.
Importance of Dependencies
Database dependencies are important to understand because they provide the basic building blocks
used in database normalization. For example:
For a table to be in second normal form (2NF), there must be no case of a non‐prime attribute in the
table that is functionally dependent upon a subset of a candidate key.
For a table to be in third normal form (3NF), every non‐prime attribute must have a non‐transitive
functional dependency on every candidate key.
For a table to be in Boyce‐Codd Normal Form (BCNF), every functional dependency (other than trivial
dependencies) must be on a superkey.
For a table to be in fourth normal form (4NF), it must have no multivalued dependencies.
ETL is short for extract, transform, load, three database functions that are combined into one tool to pull data out
of one database and place it into another database.
Extract is the process of reading data from a database.
Transform is the process of converting the extracted data from its previous form into the form it needs to be in so
that it can be placed into another database. Transformation occurs by using rules or lookup tables or by combining
the data with other data.
Load is the process of writing the data into the target database.
ETL is used to migrate data from one database to another, to form data marts and data warehouses and also to
convert databases from one format or type to another.
OLAP is an acronym for Online Analytical Processing. OLAP performs multidimensional analysis of business data and provides
the capability for complex calculations, trend analysis, and sophisticated data modeling.
Data Warehouse
A data warehouse is a:
subject‐oriented
integrated
timevarying
non‐volatilecollection of data in support of the management's decision‐making process.
A data warehouse is a centralized repository that stores data from multiple information sources and transforms them into
a common, multidimensional data model for efficient querying and analysis.
A relationship works by matching data in key columns — usually columns with the same name in both tables. In most
cases, the relationship matches the primary key from one table, which provides a unique identifier for each row, with
an entry in the foreign key in the other table.
There are several types of database relationships. Today we are going to cover the following:
One to One Relationships
One to Many and Many to One Relationships
Many to Many Relationships
Self‐Referencing Relationships
When selecting data from multiple tables with relationships, we will be using the JOIN query. There
are several types of JOIN's, and we are going to learn about the the following:
Cross Joins
Natural Joins
Inner Joins
Left (Outer) Joins
Right (Outer) Joins
We will also learn about the ON clause and the USING clause.
One to One Relationships
Let's say you have a table for customers:
We can put the customer address information on a separate table:
Now we have a relationship between the Customers table and the Addresses table. If each address
can belong to only one customer, this relationship is "One to One". Keep in mind that this kind of
relationship is not very common. Our initial table that included the address along with the customer
could have worked fine in most cases.
Notice that now there is a field named "address_id" in the Customers table, that refers to the
matching record in the Address table. This is called a "Foreign Key" and it is used for all kinds of
database relationships. We will cover this subject later in the article.
We can visualize the relationship between the customer and address records like this:
Note that the existence of a relationship can be optional, like having a customer record that has no
related address record.
The Items_Orders table has only one purpose, and that is to create a "Many to Many" relationship
between the items and the orders.
Here is a how we can visualize this kind of relationship:
If you want to include the items_orders records in the graph, it may look like this:
Self‐Referencing Relationships
This is used when a table needs to have a relationship with itself. For example, let's say you have a
referral program. Customers can refer other customers to your shopping website. The table may look
like this:
One customer might refer zero, one or multiple customers. Each customer can be referred by only one
customer, or none at all.
If you would like to create a self‐referencing "many to many" relationship, you would need an extra
table like just like we talked about in the last section.
DIAGRAMMING DATABASES
Entity‐relationship diagrams (ERDs) are an important tool in good database design.
What is a ERD?
An entity‐relationship diagram (ERD) is a data modeling technique that graphically illustrates an information system’s
entities and the relationships between those entities. An ERD is a conceptual and representational model of data used to represent
the entity framework infrastructure.
The elements of an ERD are:
Entities
Relationships
Attributes
Steps involved in creating an ERD include:
1. Identifying and defining the entities
2. Determining all interactions between the entities
3. Analyzing the nature of interactions/determining the cardinality of the relationships
4. Creating the ERD
DENORMALIZATION
Denormalization is the process of trying to improve the read performance of a database, at the expense of
losing some write performance, by adding redundant copies of data or by grouping data.