Professional Documents
Culture Documents
Relational database design (RDD) models information and data into a set of tables with rows and
columns. The Structured Query Language (SQL) is used to manipulate relational databases.
The design of a relational database is composed of four stages, where the data are modeled into a
set of related tables.
Relational database design (RDD) models information and data into a set of tables with rows and
columns. Each row of a relation/table represents a record, and each column represents an
attribute of data.
The Structured Query Language (SQL) is used to manipulate relational databases. The design of
a relational database is composed of four stages, where the data are modeled into a set of related
tables. The stages are:
• Define relations/attributes
• Define relationships
• Normalization
Functional Dependency
Functional Dependency (FD) is a constraint that determines the relation of one attribute to
another attribute in a Database Management System (DBMS).
Augmentation rule: When x -> y holds, and c is attribute set, then ac -> bc also holds. That is
adding attributes which do not change the basic dependencies.
Transitivity rule: This rule is very much similar to the transitive rule in algebra if x -> y holds
and y -> z holds, then x -> z also holds. X -> y is called as functionally that determines y.
There are mainly four types of Functional Dependency in DBMS. Following are the types of
Functional Dependencies in DBMS:
Multivalued Dependency
Transitive Dependency
Multivalued dependency occurs in the situation where there are multiple independent
multivalued attributes in a single table. A multivalued dependency is a complete constraint
between two sets of attributes in a relation. It requires that certain tuples be present in a relation.
Consider the following Multivalued Dependency.
Example: maf_year and color are independent of each other but dependent on car_model. In
this example, these two columns are said to be multivalue dependent on car_model.
• car_model-> colour
Trivial dependency is a set of attributes which are called a trivial if the set of attributes are
included in that attribute.
So, X -> Y is a trivial functional dependency if Y is a subset of X. Let's understand with a Trivial
Functional Dependency.
Example:
Functional dependency which also known as a nontrivial dependency occurs when A->B holds
true where B is not a subset of A.
Example:
Example:
Alibaba Jack Ma 54
• {Company} -> {CEO} (if we know the compay, we know its CEO's name)
• { Company} -> {Age} should hold, that makes sense because if we know the
company name, we can know his age.
Decomposition
Decomposition is the process of breaking down in parts or elements. It replaces a relation with a
collection of smaller relations. It breaks the table into multiple tables in a database. ... If there is
no proper decomposition of the relation, then it may lead to problems like loss of information.
Decomposition in DBMS removes redundancy, anomalies and inconsistencies from a database
by dividing the table into multiple tables.
Decomposition in DBMS removes
• redundancy
• anomalies
• Inconsistencies
from a database by dividing the table into multiple tables.
The following are the types:
Lossless Decomposition:
Decomposition is lossless if it is feasible to reconstruct relation R from decomposed tables using
Joins. This is the preferred choice. The information will not lose from the relation when
decomposed. The join would result in the same original relation.
Decomposition is lossless, if it is feasible to reconstruct relation R from decomposed tables using
Joins. The information will not lose from the relation when decomposed.
• Consider there is a relation R which is decomposed into sub relations R1 , R2 , …. , Rn.
• This decomposition is called lossless join decomposition when the join of the sub
relations results in the same relation R that was decomposed.
• For lossless join decomposition, we always have-
• R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn = Rwhere ⋈ is a natural join operator
Example-
Consider the following relation R( A , B , C )
R( A , B , C )
A B C
1 2 1
2 5 3
3 3 3
Consider this relation is decomposed into two sub relations R1( A , B ) and R2( B , C )-
A B
1 2
2 5
3 3
B C
2 1
5 3
3 3
A B C
1 2 1
2 5 3
3 3 3
<DeptDetails>
Dept_ID Emp_ID Dept_Name
Dpt1 E001 Operations
Dpt2 E002 HR
Dpt3 E003 Finance
Therefore, the above relation had lossless decomposition i.e. no loss of information.
Lossy Decomposition:
when a relation is decomposed into two or more relational schemas, the loss of information is
unavoidable when the original relation is retrieved.
• Consider there is a relation R which is decomposed into sub relations R1 , R2 , …. , Rn.
• This decomposition is called lossy join decomposition when the join of the sub relations
does not result in the same relation R that was decomposed.
• The natural join of the sub relations is always found to have some extraneous tuples.
• For lossy join decomposition, we always have-
• R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn ⊃ Rwhere ⋈ is a natural join operator
Example-
Consider the following relation R( A , B , C ).
A B C
1 2 1
2 5 3
3 3 3
R( A , B , C )
Consider this relation is decomposed into two sub relations as R1( A , C ) and R2( B , C ).
A C
1 1
2 3
3 3
R1( A , c )
B C
2 1
5 3
3 3
R2( B , C )
A B C
1 2 1
2 5 3
2 3 3
3 5 3
3 3 3
<DeptDetails>
Dept_ID Dept_Name
Dpt1 Operations
Dpt2 HR
Dpt3 Finance
Now, you won’t be able to join the above tables, since Emp_ID isn’t part of
the DeptDetails relation.
Therefore, the above relation has lossy decomposition.
Normalization
Normalization is the process of minimizing redundancy from a relation or set of relations.
Redundancy in relation may cause insertion, deletion and updation anomalies. So, it helps to
minimize the redundancy in relations. Normal forms are used to eliminate or reduce redundancy
in database tables.
Advantages of Functional Dependency:
• Functional Dependency avoids data redundancy. Therefore same data do not repeat
at multiple locations in that database
• It helps you to maintain the quality of data in the database
• It helps you to defined meanings and constraints of databases
• It helps you to identify bad designs
• It helps you to find the facts regarding the database design.
Normalization rules are divided into the following normal forms:
4. BCNF
Example :
Student courses
student courses
Jane Smith Databases, Mathematics
John Lipinsky English Literature, Databases
Dave Beyer English Literature,
Mathematics
This relation is not in 1NF because the courses attribute has multiple values. Jane Smith is
assigned to two courses (Databases and Mathematics), and they are stored in one field as a
comma-separated list. This list can be broken down into smaller elements (i.e. course subjects:
databases as one element, mathematics as another), so it’s not an atomic value.
To transform this relation to the first normal form, we should store each course subject as a
single value, so that each student-course assignment is a separate tuple:
Student courses
Student Course
Jane Smith Databases
Jane Smith Mathematics
John Lipinsky English Literature
John Lipinsky Databases
Dave Beyer English Literature
Dave Beyer Mathematics
2.EXAMPLE:
First Normal Form is defined in the definition of relations (tables) itself. This rule defines that
all the attributes in a relation must have atomic domains. The values in an atomic domain are
indivisible units.
Student_prof
Fd1
FD2
FD3
Ep1 Ep2
Ep3
Third Normal Pnumber Pname plocation form (3NF)
Third normal 1 Monitor Newyork form (3NF) is a
database schema design approach for
2 Speaker Francisco
3 Mouse Chicago
relational databases which uses normalizing principles to reduce the duplication of data, avoid data
anomalies.
In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for
each functional dependency X-> Y at least one of the following conditions hold:
An attribute that is a part of one of the candidate keys is known as prime attribute.
A transitive dependency in a database is an indirect relationship between values in the same
table that causes a functional dependency. To achieve the normalization standard of Third
Normal Form (3NF), you must eliminate any transitive dependency.
An indirect relationship between data elements in a database. The rule is essentially that A is
a transitive dependency of C (A->C) if A is functionally dependent on B (A->B), and B
is functionally dependent on C (B->C) but not on A (B not->A)
Boyce–Codd normal form ( BCNF )
It is used in database normalization. It is a slightly stronger version of the third normal
form (3NF). BCNF was developed in 1974 by Raymond F. Boyce and Edgar F. Codd to address
certain types of anomalies not dealt with by 3NF as originally defined.
The second point sounds a bit tricky, right? In simple words, it means, that for a dependency
A → B, A cannot be a non-prime attribute, if B is a prime attribute.
To make this relation(table) satisfy BCNF, we will decompose this table into two
tables, student table and professor table.
Below we have the structure for both the tables.
Student Table
student_id p_id
101 1
101 2
Professor Table
1 P.Java Java
2 P.Cpp C++
A more Generic Explanation
In the picture below, we have tried to explain BCNF in terms of relations.
Decomposition Algorithms
In the previous section, we discussed decomposition and its types with the help of small
examples. In the actual world, a database schema is too wide to handle. Thus, it requires
algorithms that may generate appropriate databases.
Here, we will get to know the decomposition algorithms using functional dependencies for two
different normal forms, which are:
o Decomposition to BCNF
o Decomposition to 3NF
Decomposition to BCNF
Before applying the BCNF decomposition algorithm to the given relation, it is necessary to test if
the relation is in Boyce-Codd Normal Form. After the test, if it is found that the given relation is
not in BCNF, we can decompose it further to create relations in BCNF.
There are following cases which require to be tested if the given relation schema R satisfies the
BCNF rule:
Case 1: Check and test, if a nontrivial dependency α -> β violate the BCNF rule, evaluate and
compute α+ , i.e., the attribute closure of α. Also, verify that α+ includes all the attributes of the
given relation R. It means it should be the superkey of relation R.
Case 2: If the given relation R is in BCNF, it is not required to test all the dependencies in F+. It
only requires determining and checking the dependencies in the provided dependency set F for
the BCNF test. It is because if no dependency in F causes a violation of BCNF, consequently,
none of the F+ dependency will cause any violation of BCNF.
This algorithm is used if the given relation R is decomposed in several relations R1, R2,…,
Rn because it was not present in the BCNF. Thus,
For every subset α of attributes in the relation Ri, we need to check that α+ (an attribute closure of
α under F) either includes all the attributes of the relation Ri or no attribute of Ri-α.
result={R};
done=false;
compute F+;
while (not done) do
if (there is a schema Ri in result that is not in BCNF)
then begin
let α->β be a nontrivial functional dependency that holds
on Ri such that α->Ri is not in F+, and α ꓵ β= ø;
result=(result-Ri) U (Ri-β) U (α,β);
end
else done=true;
This algorithm is used for decomposing the given relation R into its several decomposers. This
algorithm uses dependencies that show the violation of BCNF for performing the decomposition
of the relation R. Thus, such an algorithm not only generates the decomposers of relation R in
BCNF but is also a lossless decomposition. It means there occurs no data loss while
decomposing the given relation R into R1, R2, and so on…
Decomposition to 3NF
The decomposition algorithm for 3NF ensures the preservation of dependencies by explicitly
building a schema for each dependency in the canonical cover. It guarantees that at least one
schema must hold a candidate key for the one being decomposed, which in turn ensures the
decomposition generated to be a lossless decomposition.
Here, R is the given relation, and F is the given set of functional dependency for which
Fc maintains the canonical cover. R1, R2, . . . , Ri are the decomposed parts of the given relation
R. Thus, this algorithm preserves the dependency as well as generates the lossless decomposition
of relation R.
A 3NF algorithm is also known as a 3NF synthesis algorithm. It is called so because the normal
form works on a dependency set, and instead of repeatedly decomposing the initial schema, it
adds one schema at a time.
NOTE: Whenever a user updates the database, the system must check whether any of the functional
dependencies are getting violated in this process. A canonical cover of a set of functional
dependencies F is a simplified set of functional dependencies.
Modeling temporal data
Temporal data is simply data that represents a state in time, such as the land-use patterns of
Temporal data is collected to analyze weather patterns and other environmental variables,
monitor traffic conditions.
A temporal database stores data relating to time instances. It offers temporal data types and
stores information relating to past, present and future time. Temporal databases could be uni-
temporal, bi-temporal or tri-temporal.
More specifically the temporal aspects usually include valid time, transaction time or decision
time.
Valid time is the time period during which a fact is true in the real world.
Transaction time is the time at which a fact was recorded in the database.
Decision time is the time at which the decision was made about the fact.
A temporal table is a table that records the period of time when a row is valid with respect to
system time (or transaction time, when the transaction is recorded), business time (or valid time,
when the data is valid with respect to information about the real world),
You can use the workbench to model data that is based on time. Use the temporal features of
the workbench to create and modify temporal objects.
A period is an interval of time that is defined by two date or time columns in a temporal table. A
period contains a begin column and an end column. The begin column indicates the beginning of
the period, and the end column indicates the end of the period.
The beginning value of a period is inclusive, but the ending value of a period is exclusive. For
example, if the begin column has a value of 01/01/1995, that date is included in the row.
Whereas, if the end column has a value of 03/21/1995, that date is not within the period of the
row.
The workbench supports three types of temporal tables (DB2-version , New-Function Mode or
DB2 for Linux, UNIX, and Windows version 10 only):
System period
A system period table maintains historical information that is based off of system time
(or transaction time, when the transaction is recorded).The system period consists of a
pair of columns with system-maintained values that indicate the period of time when a
transaction occurred. The begin column contains the date or timestamp value for when a
row is created, either by an insert operation or an update operation on an existing row.
The end column contains the timestamp value for when the row is no longer valid. The
value is entered here when a row is updated or deleted.
When you define a base table to use system-period data versioning, or when you define
system-period data versioning on an existing table, you must create a history table,
specify a name for the history table, and then you can create a table space to hold that
table.
Application period
Bitemporal
A bitemporal table maintains historical information that is based off of both system
time and business time. You can use a bitemporal table to keep application period
information and system-based historical information. Therefore, you have a lot of
flexibility in how you query data based on periods of time.
INDEXING
Indexes are used to quickly locate data without having to search every row in a database table every time
a database table is accessed. ... Indexes can be created using one or more columns of a database table,
providing the basis for both rapid random lookups and efficient access of ordered records.
An index is a list of data, such as group of files or database entries. It is typically saved in a plain text
format that can be quickly scanned by a search algorithm. This significantly speeds up searching and
sorting operations on data referenced by the index.
Types of Indexing
Indexing in Database is defined based on its indexing attributes. Two main types of indexing
methods are:
Primary Indexing
Secondary Indexing
Primary Index
Primary Index is an ordered file which is fixed length size with two fields. The first field is the
same a primary key and second, filed is pointed to that specific data block. In the primary Index,
there is always one to one relationship between the entries in the index table.
The primary Indexing in DBMS is also further divided into two types.
Dense Index
Sparse Index
Dense Index
In a dense index, a record is created for every search key valued in the database. This helps you
to search faster but needs more space to store index records. In this Indexing, method records
contain search key value and points to the real record on the disk.
Sparse Index
It is an index record that appears for only some of the values in the file. Sparse Index helps you
to resolve the issues of dense Indexing in DBMS. In this method of indexing technique, a range
of index columns stores the same data block address, and when data needs to be retrieved, the
block address will be fetched.
However, sparse Index stores index records for only some search-key values. It needs less space,
less maintenance overhead for insertion, and deletions but It is slower compared to the dense
Index for locating records.
Example :
Secondary Index
The secondary Index in DBMS can be generated by a field which has a unique value for each
record, and it should be a candidate key. It is also known as a non-clustering index.
This two-level database indexing technique is used to reduce the mapping size of the first level.
For the first level, a large range of numbers is selected because of this; the mapping size always
remains small.
Here, The secondary index in DBMS for every search-key. Index record is a record point to a
bucket that contains pointers to all the records with their specific search-key value.
Clustering Index
In a clustered index, records themselves are stored in the Index and not pointers. Sometimes the
Index is created on non-primary key columns which might not be unique for each record. you
can group two or more columns to get the unique values and create an index which is called
clustered Index. This also helps you to identify the record faster.
Example:
Let's assume that a company recruited many employees in various departments. In this case,
clustering indexing in DBMS should be created for all employees who belong to the same dept.
It is considered in a single cluster, and index points point to the cluster as a whole. Here,
Department _no is a non-unique key.
Multilevel Indexing in Database is created when a primary index does not fit in memory. In this
type of indexing method, you can reduce the number of disk accesses to short any record and
kept on a disk as a sequential file and create a sparse base on that file.
B+ Tree
A B+ tree is a balanced binary search tree that follows a multi-level index format. The leaf nodes
of a B+ tree denote actual data pointers. B+ tree ensures that all leaf nodes remain at the same
height, thus balanced. Additionally, the leaf nodes are linked using a link list; therefore, a
B+ tree can support random access as well as sequential access.
Structure of B+ Tree
Every leaf node is at equal distance from the root node. A B + tree is of the order n where n is
fixed for every B+ tree.
The B-Trees are specialized m-way search tree. This can be widely used for disc access. A B-tree
of order m, can have maximum m-1 keys and m children. This can store large number of
elements in a single node. So the height is relatively small. This is one great advantage of B-
Trees.
B-Tree has all of the properties of one m-way tree. It has some other properties.
Every node in B-Tree will hold maximum m children
Every node except root and leaves, can hold at least m/2 children
The root nodes must have at least two children.
Internal nodes −
Internal (non-leaf) nodes contain at least ⌈n/2⌉ pointers, except the root node.
At most, an internal node can contain n pointers.
Leaf nodes −
Leaf nodes contain at least ⌈n/2⌉ record pointers and ⌈n/2⌉ key values.
At most, a leaf node can contain n record pointers and n key values.
Every leaf node contains one block pointer P to point to next leaf node and forms a
linked list.
B+ Tree Insertion
B+ trees are filled from bottom and each entry is done at the leaf node.
If a leaf node overflows −
o Split node into two parts.
o Partition at i = ⌊(m+1)/2⌋.
o First i entries are stored in one node.
o Rest of the entries (i+1 onwards) are moved to a new node.
o ith key is duplicated at the parent of the leaf.
If a non-leaf node overflows −
o Split node into two parts.
o Partition the node at i = ⌈(m+1)/2⌉.
o Entries up to i are kept in one node.
o Rest of the entries are moved to a new node.
B+ Tree Deletion
B+ tree entries are deleted at the leaf nodes.
The target entry is searched and deleted.
o If it is an internal node, delete and replace with the entry from the left position.
After deletion, underflow is tested,
o If underflow occurs, distribute the entries from the nodes left to it.
If distribution is not possible from left, then
o Distribute from the nodes right to it.
If distribution is not possible from left or from right, then
o Merge the node with left and right to it.
B+ Tree Insertion
Suppose we want to insert a record 60 in the below structure. It will go to the 3rd leaf node after
55. It is a balanced tree, and a leaf node of this tree is already full, so we cannot insert 60 there.
n this case, we have to split the leaf node, so that it can be inserted into tree without affecting the
fill factor, balance and order.
The 3rd leaf node has the values (50, 55, 60, 65, 70) and its current root node is 50. We will split
the leaf node of the tree in the middle so that its balance is not altered. So we can group (50, 55)
and (60, 65, 70) into 2 leaf nodes.
If these two has to be leaf nodes, the intermediate node cannot branch from 50. It should have 60
added to it, and then we can have pointers to a new leaf node.
This is how we can insert an entry when there is overflow. In a normal scenario, it is very easy to
find the node where it fits and then place it in that leaf node.
B+ Tree Deletion
Suppose we want to delete 60 from the above example. In this case, we have to remove 60 from
the intermediate node as well as from the 4th leaf node too. If we remove it from the
intermediate node, then the tree will not satisfy the rule of the B+ tree. So we need to modify it to
have a balanced tree.
After deleting node 60 from above B+ tree and re-arranging the nodes, it will show as follows: