You are on page 1of 29

III UNIT

RELATIONAL DATABASE DESIGN

Relational Database Design

Relational database design (RDD) models information and data into a set of tables with rows and
columns. The Structured Query Language (SQL) is used to manipulate relational databases.
The design of a relational database is composed of four stages, where the data are modeled into a
set of related tables.

What Does Relational Database Design (RDD):

Relational database design (RDD) models information and data into a set of tables with rows and
columns. Each row of a relation/table represents a record, and each column represents an
attribute of data.

Relational Database Design Process:

• Define the Purpose of the Database (Requirement Analysis)

• Gather Data, Organize in tables and Specify the Primary Keys

• Create Relationships among Tables

• Refine & Normalize the Design

The Structured Query Language (SQL) is used to manipulate relational databases. The design of
a relational database is composed of four stages, where the data are modeled into a set of related
tables. The stages are:

• Define relations/attributes

• Define primary keys

• Define relationships

• Normalization

Functional Dependency
Functional Dependency (FD) is a constraint that determines the relation of one attribute to
another attribute in a Database Management System (DBMS).

Functional Dependency helps to maintain the quality of data in the database.

A functional dependency is denoted by an arrow "→". The functional dependency of X on Y is


represented by X → Y. Let's understand Functional Dependency in DBMS with example.
In this example, Employee number, we can obtain Employee Name, city, salary, etc. By this, we
can say that the city, Employee Name, and salary are functionally depended on Employee
number.

Rules of Functional Dependencies:

Three most important rules for Functional Dependency in Database:

Reflexive rule : If X is a set of attributes and Y is_subset_of X, then X holds a value of Y.

Augmentation rule: When x -> y holds, and c is attribute set, then ac -> bc also holds. That is
adding attributes which do not change the basic dependencies.

Transitivity rule: This rule is very much similar to the transitive rule in algebra if x -> y holds
and y -> z holds, then x -> z also holds. X -> y is called as functionally that determines y.

Types of Functional Dependencies in DBMS

There are mainly four types of Functional Dependency in DBMS. Following are the types of
Functional Dependencies in DBMS:

 Multivalued Dependency

 Trivial Functional Dependency

 Non-Trivial Functional Dependency

 Transitive Dependency

Multivalued Dependency in DBMS:

Multivalued dependency occurs in the situation where there are multiple independent
multivalued attributes in a single table. A multivalued dependency is a complete constraint
between two sets of attributes in a relation. It requires that certain tuples be present in a relation.
Consider the following Multivalued Dependency.

Example: maf_year and color are independent of each other but dependent on car_model. In
this example, these two columns are said to be multivalue dependent on car_model.

Car_model Maf_year Color

H001 2017 Metallic

H001 2017 Green

H005 2018 Metallic

H005 2018 Blue


H010 2015 Metallic

H033 2012 Gray

• This dependence can be represented like this:

• car_model -> maf_year

• car_model-> colour

Trivial Functional Dependency in DBMS:

Trivial dependency is a set of attributes which are called a trivial if the set of attributes are
included in that attribute.

So, X -> Y is a trivial functional dependency if Y is a subset of X. Let's understand with a Trivial
Functional Dependency.

Example:

An employ table with three attributes:

Emp_id, Emp_nam, Emp_address

Emp_id Emp_name Emp_address

AS555 Harry London

AS811 George Tokyo

AS999 Kevin Francisco

• {Emp_id, Emp_name} -> Emp_name is a trivial functional dependency as

[Emp_id is a subset of {Emp_id,Emp_name}].

Non Trivial Functional Dependency in DBMS:

Functional dependency which also known as a nontrivial dependency occurs when A->B holds
true where B is not a subset of A.

In a relationship, if attribute B is not a subset of attribute A, then it is considered as a non-trivial


dependency.

Example:

Emp_id -> emp_nam (emp_name is not a subset of emp_id)


• Emp_id -> emp_address (emp_address is not a subset of emp_id)

Transitive Dependency in DBMS:

A Transitive Dependency is a type of functional dependency which happens when t is indirectly


formed by two functional dependencies.

Example:

Company CEO Age

Microsoft Satya Nadella 51

Google Sundar Pichai 46

Alibaba Jack Ma 54

• {Company} -> {CEO} (if we know the compay, we know its CEO's name)

• {CEO } -> {Age} If we know the CEO, we know the Age

• Therefore according to the rule of rule of transitive dependency:

• { Company} -> {Age} should hold, that makes sense because if we know the
company name, we can know his age.

Decomposition
Decomposition is the process of breaking down in parts or elements. It replaces a relation with a
collection of smaller relations. It breaks the table into multiple tables in a database. ... If there is
no proper decomposition of the relation, then it may lead to problems like loss of information.
Decomposition in DBMS removes redundancy, anomalies and inconsistencies from a database
by dividing the table into multiple tables.
Decomposition in DBMS removes
• redundancy
• anomalies
• Inconsistencies
from a database by dividing the table into multiple tables.
The following are the types:

Lossless Decomposition:
Decomposition is lossless if it is feasible to reconstruct relation R from decomposed tables using
Joins. This is the preferred choice. The information will not lose from the relation when
decomposed. The join would result in the same original relation.
Decomposition is lossless, if it is feasible to reconstruct relation R from decomposed tables using
Joins. The information will not lose from the relation when decomposed.
• Consider there is a relation R which is decomposed into sub relations R1 , R2 , …. , Rn.
• This decomposition is called lossless join decomposition when the join of the sub
relations results in the same relation R that was decomposed.
• For lossless join decomposition, we always have-
• R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn = Rwhere ⋈ is a natural join operator

Example-
 Consider the following relation R( A , B , C ) 
 R( A , B , C )
A B C

1 2 1

2 5 3

3 3 3

Consider this relation is decomposed into two sub relations R1( A , B ) and R2( B , C )-

A B

1 2

2 5

3 3

B C

2 1

5 3

3 3

A B C

1 2 1

2 5 3

3 3 3

• This relation is same as the original relation R.


• Thus, we conclude that the above decomposition is lossless join decomposition.
EXAMPLE TABLE:
Let us see an example:
<EmpInfo>
Emp_ID Emp_Name Emp_Age Emp_Location Dept_ID Dept_Name
E001 Jacob 29 Alabama Dpt1 Operations
E002 Henry 32 Alabama Dpt2 HR
E003 Tom 22 Texas Dpt3 Finance
Decompose the above table into two tables:
<EmpDetails>
Emp_ID Emp_Name Emp_Age Emp_Location
E001 Jacob 29 Alabama
E002 Henry 32 Alabama
E003 Tom 22 Texas

<DeptDetails>
Dept_ID Emp_ID Dept_Name
Dpt1 E001 Operations
Dpt2 E002 HR
Dpt3 E003 Finance

Now, Natural Join is applied on the above two tables:


The result will be:
Emp_ID Emp_Name Emp_Age Emp_Location Dept_ID Dept_Name
E001 Jacob 29 Alabama Dpt1 Operations
E002 Henry 32 Alabama Dpt2 HR
E003 Tom 22 Texas Dpt3 Finance

Therefore, the above relation had lossless decomposition i.e. no loss of information.

Lossy Decomposition:
when a relation is decomposed into two or more relational schemas, the loss of information is
unavoidable when the original relation is retrieved.

• Consider there is a relation R which is decomposed into sub relations R1 , R2 , …. , Rn.
• This decomposition is called lossy join decomposition when the join of the sub relations
does not result in the same relation R that was decomposed.
• The natural join of the sub relations is always found to have some extraneous tuples.
• For lossy join decomposition, we always have-
• R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn ⊃ Rwhere ⋈ is a natural join operator
Example-
Consider the following relation R( A , B , C ).

A B C

1 2 1

2 5 3

3 3 3

R( A , B , C )
 Consider this relation is decomposed into two sub relations as R1( A , C ) and R2( B , C ).

The two sub relations are-

A C

1 1

2 3

3 3

R1( A , c )

B C

2 1

5 3

3 3

R2( B , C )

A B C

1 2 1
2 5 3

2 3 3

3 5 3

3 3 3

Let us check whether this decomposition is lossy or not.


For lossy decomposition, we must have-
R1 ⋈ R2 ⊃ R
 Now, if we perform the natural join ( ⋈ ) of the sub relations R1 and R2
EXAMPLE TABLE:
Let us see an example:
<EmpInfo>
Emp_ID Emp_Name Emp_Age Emp_Location Dept_ID Dept_Name
E001 Jacob 29 Alabama Dpt1 Operations
E002 Henry 32 Alabama Dpt2 HR
E003 Tom 22 Texas Dpt3 Finance
Decompose the above table into two tables:
<EmpDetails>
Emp_ID Emp_Name Emp_Age Emp_Location
E001 Jacob 29 Alabama
E002 Henry 32 Alabama
E003 Tom 22 Texas

<DeptDetails>
Dept_ID Dept_Name
Dpt1 Operations
Dpt2 HR
Dpt3 Finance

Now, you won’t be able to join the above tables, since Emp_ID isn’t part of
the DeptDetails relation.
Therefore, the above relation has lossy decomposition.

Normalization 
Normalization is the process of minimizing redundancy from a relation or set of relations.
Redundancy in relation may cause insertion, deletion and updation anomalies. So, it helps to
minimize the redundancy in relations. Normal forms are used to eliminate or reduce redundancy
in database tables.
Advantages of Functional Dependency:
• Functional Dependency avoids data redundancy. Therefore same data do not repeat
at multiple locations in that database
• It helps you to maintain the quality of data in the database
• It helps you to defined meanings and constraints of databases
• It helps you to identify bad designs
• It helps you to find the facts regarding the database design.
Normalization rules are divided into the following normal forms:

1. First Normal Form

2. Second Normal Form

3. Third Normal Form

4. BCNF

First Normal Form (1NF):

First normal form (1NF) is a property of a relation in a relational database. A relation is in first


normal form if and only if the domain of each attribute contains only atomic (indivisible)
values, and the value of each attribute contains only a single value from that domain.

Example :

Student courses

student courses
Jane Smith Databases, Mathematics
John Lipinsky English Literature, Databases
Dave Beyer English Literature,
Mathematics

This relation is not in 1NF because the courses attribute has multiple values. Jane Smith is
assigned to two courses (Databases and Mathematics), and they are stored in one field as a
comma-separated list. This list can be broken down into smaller elements (i.e. course subjects:
databases as one element, mathematics as another), so it’s not an atomic value.
To transform this relation to the first normal form, we should store each course subject as a
single value, so that each student-course assignment is a separate tuple:

Student courses

Student Course
Jane Smith Databases
Jane Smith Mathematics
John Lipinsky English Literature
John Lipinsky Databases
Dave Beyer English Literature
Dave Beyer Mathematics
2.EXAMPLE:
First Normal Form is defined in the definition of relations (tables) itself. This rule defines that
all the attributes in a relation must have atomic domains. The values in an atomic domain are
indivisible units.

We re-arrange the relation (table) as below, to convert it to First Normal Form.

Second normal form (2NF) :


is a normal form used in database normalization. A relation is in the second normal form ,if
it fulfills the following two requirements: It is in first normal form. It does not have any non-
prime attribute that is functionally dependent on any proper subset of any candidate key of the
relation.
or
Second Normal Form (2NF) is based on the concept of full functional dependency. A relation
is in 2NF if it has No Partial Dependency, i.e., no non-prime attribute (attributes which are not
part of any candidate key) is dependent on any proper subset of any candidate key of the table.
Partial Dependency occurs when a non-prime attribute is functionally dependent on part of a
candidate key. The 2nd Normal Form (2NF) eliminates the Partial Dependency.
Partial dependency means that a nonprime attribute is functionally dependent on part of a
candidate key. (A nonprime attribute is an attribute that's not part of any candidate key.) For
example, let's start with R{ABCD}, and the functional dependencies AB->CD and A->C.

Student_prof

IDst IDprof LastName Prof Grade

Fd1
FD2
FD3

Std professor Grade

IDst LastName IDprof Prof IDSt IDProf Grade

FD1 FD2 FD3


EMP_PROJ

Ssn Pnumber Hours Ename Pname Plocaction

1001 1 14hrs Smith Monitor Newyork

1002 2 26hrs Miller Speaker Francisco

1003 3 67hrs Jones Mouse Chicago

Ep1 Ep2

SSn Pnumber Hours Ssn Ename

1001 1 14hrs 1001 Smith

1002 2 26hrs 1002 Miller

1003 3 67hrs 1003 Jones

Ep3
Third Normal Pnumber Pname plocation form (3NF)
Third normal 1 Monitor Newyork form (3NF) is a
database schema design approach for
2 Speaker Francisco

3 Mouse Chicago
relational databases which uses normalizing principles to reduce the duplication of data, avoid data
anomalies.

A table design is said to be in 3NF if both the following conditions hold:

 Table must be in 2NF


 Transitive functional dependency of non-prime attribute on any super key should be
removed.

An attribute that is not part of any candidate key is known as non-prime attribute.

In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for
each functional dependency X-> Y at least one of the following conditions hold:

 X is a super key of table


 Y is a prime attribute of table

An attribute that is a part of one of the candidate keys is known as prime attribute.
A transitive dependency in a database is an indirect relationship between values in the same
table that causes a functional dependency. To achieve the normalization standard of Third
Normal Form (3NF), you must eliminate any transitive dependency.
An indirect relationship between data elements in a database. The rule is essentially that A is
a transitive dependency of C (A->C) if A is functionally dependent on B (A->B), and B
is functionally dependent on C (B->C) but not on A (B not->A)
Boyce–Codd normal form ( BCNF )
It is used in database normalization. It is a slightly stronger version of the third normal
form (3NF). BCNF was developed in 1974 by Raymond F. Boyce and Edgar F. Codd to address
certain types of anomalies not dealt with by 3NF as originally defined.

Rules for BCNF


For a table to satisfy the Boyce-Codd Normal Form, it should satisfy the following two
conditions:

1. It should be in the Third Normal Form.


2. And, for any dependency A → B, A should be a super key.

The second point sounds a bit tricky, right? In simple words, it means, that for a dependency
A → B, A cannot be a non-prime attribute, if B is a prime attribute.
To make this relation(table) satisfy BCNF, we will decompose this table into two
tables, student table and professor table.
Below we have the structure for both the tables.
Student Table

student_id p_id

101 1

101 2

Professor Table

p_id professor subject

1 P.Java Java

2 P.Cpp C++
A more Generic Explanation
In the picture below, we have tried to explain BCNF in terms of relations.
Decomposition Algorithms

In the previous section, we discussed decomposition and its types with the help of small
examples. In the actual world, a database schema is too wide to handle. Thus, it requires
algorithms that may generate appropriate databases.

Here, we will get to know the decomposition algorithms using functional dependencies for two
different normal forms, which are:

o Decomposition to BCNF
o Decomposition to 3NF

Decomposition using functional dependencies aims at dependency preservation and lossless


decomposition.

Decomposition to BCNF

Before applying the BCNF decomposition algorithm to the given relation, it is necessary to test if
the relation is in Boyce-Codd Normal Form. After the test, if it is found that the given relation is
not in BCNF, we can decompose it further to create relations in BCNF.

There are following cases which require to be tested if the given relation schema R satisfies the
BCNF rule:

Case 1: Check and test, if a nontrivial dependency α -> β violate the BCNF rule, evaluate and
compute α+ , i.e., the attribute closure of α. Also, verify that α+ includes all the attributes of the
given relation R. It means it should be the superkey of relation R.
Case 2: If the given relation R is in BCNF, it is not required to test all the dependencies in F+. It
only requires determining and checking the dependencies in the provided dependency set F for
the BCNF test. It is because if no dependency in F causes a violation of BCNF, consequently,
none of the F+ dependency will cause any violation of BCNF.

BCNF Decomposition Algorithm

This algorithm is used if the given relation R is decomposed in several relations R1, R2,…,
Rn because it was not present in the BCNF. Thus,

For every subset α of attributes in the relation Ri, we need to check that α+ (an attribute closure of
α under F) either includes all the attributes of the relation Ri or no attribute of Ri-α.

result={R};
done=false;
compute F+;
while (not done) do
if (there is a schema Ri in result that is not in BCNF)
then begin
let α->β be a nontrivial functional dependency that holds
on Ri such that α->Ri is not in F+, and α ꓵ β= ø;
result=(result-Ri) U (Ri-β) U (α,β);
end
else done=true;

This algorithm is used for decomposing the given relation R into its several decomposers. This
algorithm uses dependencies that show the violation of BCNF for performing the decomposition
of the relation R. Thus, such an algorithm not only generates the decomposers of relation R in
BCNF but is also a lossless decomposition. It means there occurs no data loss while
decomposing the given relation R into R1, R2, and so on…

Decomposition to 3NF

The decomposition algorithm for 3NF ensures the preservation of dependencies by explicitly
building a schema for each dependency in the canonical cover. It guarantees that at least one
schema must hold a candidate key for the one being decomposed, which in turn ensures the
decomposition generated to be a lossless decomposition.

3NF Decomposition Algorithm

let Fc be a canonical cover for F;


i=0;
for each functional dependency α->β in Fc
i = i+1;
R = αβ;
If none of the schemas Rj, j=1,2,…I holds a candidate key for R
Then
i = i+1;
Ri= any candidate key for R;
/* Optionally, remove the repetitive relations*/
Repeat
If any schema Rj is contained in another schema Rk
Then
/* Delete Rj */
Rj = R i ;
i = i-1;
until no more Rjs can be deleted
return (R1, R2, . . . ,Ri)

Here, R is the given relation, and F is the given set of functional dependency for which
Fc maintains the canonical cover. R1, R2, . . . , Ri are the decomposed parts of the given relation
R. Thus, this algorithm preserves the dependency as well as generates the lossless decomposition
of relation R.

A 3NF algorithm is also known as a 3NF synthesis algorithm. It is called so because the normal
form works on a dependency set, and instead of repeatedly decomposing the initial schema, it
adds one schema at a time.

Drawbacks of 3NF Decomposing Algorithm


o The result of the decomposing algorithm is not uniquely defined because a set of
functional dependencies can hold more than one canonical cover.
o In some cases, the result of the algorithm depends on the order in which it considers the
dependencies in Fc.
o If the given relation is already present in the third normal form, then also it may
decompose a relation.

NOTE: Whenever a user updates the database, the system must check whether any of the functional
dependencies are getting violated in this process. A canonical cover of a set of functional
dependencies F is a simplified set of functional dependencies.
Modeling temporal data

Temporal data is simply data that represents a state in time, such as the land-use patterns of
Temporal data is collected to analyze weather patterns and other environmental variables,
monitor traffic conditions.

A temporal database stores data relating to time instances. It offers temporal data types and
stores information relating to past, present and future time. Temporal databases could be uni-
temporal, bi-temporal or tri-temporal.
More specifically the temporal aspects usually include valid time, transaction time or decision
time.

 Valid time is the time period during which a fact is true in the real world.
 Transaction time is the time at which a fact was recorded in the database.
 Decision time is the time at which the decision was made about the fact.

A temporal table is a table that records the period of time when a row is valid with respect to
system time (or transaction time, when the transaction is recorded), business time (or valid time,
when the data is valid with respect to information about the real world),

You can use the workbench to model data that is based on time. Use the temporal features of
the workbench to create and modify temporal objects.

A period is an interval of time that is defined by two date or time columns in a temporal table. A
period contains a begin column and an end column. The begin column indicates the beginning of
the period, and the end column indicates the end of the period.
The beginning value of a period is inclusive, but the ending value of a period is exclusive. For
example, if the begin column has a value of 01/01/1995, that date is included in the row.
Whereas, if the end column has a value of 03/21/1995, that date is not within the period of the
row.
The workbench supports three types of temporal tables (DB2-version , New-Function Mode or
DB2 for Linux, UNIX, and Windows version 10 only):

System period

A system period table maintains historical information that is based off of system time
(or transaction time, when the transaction is recorded).The system period consists of a
pair of columns with system-maintained values that indicate the period of time when a
transaction occurred. The begin column contains the date or timestamp value for when a
row is created, either by an insert operation or an update operation on an existing row.
The end column contains the timestamp value for when the row is no longer valid. The
value is entered here when a row is updated or deleted.

The system period is meaningful because of system-period data versioning. System-


period data versioning specifies that updated or deleted rows are archived into another
table. The table that contains the current active rows of a table is called the system-period
temporal table. The table that contains the archived rows is called the history table. If you
have the correct authorization, you can delete the rows from the history table when those
rows are no longer needed.

When you define a base table to use system-period data versioning, or when you define
system-period data versioning on an existing table, you must create a history table,
specify a name for the history table, and then you can create a table space to hold that
table.

Application period

An application period table maintains historical information that is based off of


business time (or valid time, when the data is valid with respect to information about
the real world).

The application period consists of a pair of columns with application-maintained values


that indicate the period of time when a row is valid with respect to information about the
real world. The begin column contains the date or timestamp value about when a real
world event or state begins. The end column contains the value for when a row stops
being valid. A table with only an application period is called an application-period
temporal table. When you use the application period, determine the need for DB2 to
enforce uniqueness across time. You can create a primary key and specify that the values
in that key must be unique within a period.

Bitemporal

A bitemporal table maintains historical information that is based off of both system
time and business time. You can use a bitemporal table to keep application period
information and system-based historical information. Therefore, you have a lot of
flexibility in how you query data based on periods of time.

INDEXING

Indexes are used to quickly locate data without having to search every row in a database table every time
a database table is accessed. ... Indexes can be created using one or more columns of a database table,
providing the basis for both rapid random lookups and efficient access of ordered records.
An index is a list of data, such as group of files or database entries. It is typically saved in a plain text
format that can be quickly scanned by a search algorithm. This significantly speeds up searching and
sorting operations on data referenced by the index.

Types of Indexing

Indexing in Database is defined based on its indexing attributes. Two main types of indexing
methods are:

 Primary Indexing
 Secondary Indexing

Primary Index

Primary Index is an ordered file which is fixed length size with two fields. The first field is the
same a primary key and second, filed is pointed to that specific data block. In the primary Index,
there is always one to one relationship between the entries in the index table.

The primary Indexing in DBMS is also further divided into two types.

 Dense Index
 Sparse Index

Dense Index

In a dense index, a record is created for every search key valued in the database. This helps you
to search faster but needs more space to store index records. In this Indexing, method records
contain search key value and points to the real record on the disk.
Sparse Index

It is an index record that appears for only some of the values in the file. Sparse Index helps you
to resolve the issues of dense Indexing in DBMS. In this method of indexing technique, a range
of index columns stores the same data block address, and when data needs to be retrieved, the
block address will be fetched.

However, sparse Index stores index records for only some search-key values. It needs less space,
less maintenance overhead for insertion, and deletions but It is slower compared to the dense
Index for locating records.

Example :

Secondary Index

The secondary Index in DBMS can be generated by a field which has a unique value for each
record, and it should be a candidate key. It is also known as a non-clustering index.

This two-level database indexing technique is used to reduce the mapping size of the first level.
For the first level, a large range of numbers is selected because of this; the mapping size always
remains small.

Example of secondary Indexing

Let's understand secondary indexing with a database index example:


In a bank account database, data is stored sequentially by acc_no, you may want to find all
accounts in of a specific branch of ABC bank.

Here, The secondary index in DBMS for every search-key. Index record is a record point to a
bucket that contains pointers to all the records with their specific search-key value.

Clustering Index

In a clustered index, records themselves are stored in the Index and not pointers. Sometimes the
Index is created on non-primary key columns which might not be unique for each record. you
can group two or more columns to get the unique values and create an index which is called
clustered Index. This also helps you to identify the record faster.

Example:

Let's assume that a company recruited many employees in various departments. In this case,
clustering indexing in DBMS should be created for all employees who belong to the same dept.

It is considered in a single cluster, and index points point to the cluster as a whole. Here,
Department _no is a non-unique key.

What is Multilevel Index?

Multilevel Indexing in Database is created when a primary index does not fit in memory. In this
type of indexing method, you can reduce the number of disk accesses to short any record and
kept on a disk as a sequential file and create a sparse base on that file.
B+ Tree
A B+ tree is a balanced binary search tree that follows a multi-level index format. The leaf nodes
of a B+ tree denote actual data pointers. B+ tree ensures that all leaf nodes remain at the same
height, thus balanced. Additionally, the leaf nodes are linked using a link list; therefore, a
B+ tree can support random access as well as sequential access.
Structure of B+ Tree
Every leaf node is at equal distance from the root node. A B + tree is of the order n where n is
fixed for every B+ tree.

The B-Trees are specialized m-way search tree. This can be widely used for disc access. A B-tree
of order m, can have maximum m-1 keys and m children. This can store large number of
elements in a single node. So the height is relatively small. This is one great advantage of B-
Trees.
B-Tree has all of the properties of one m-way tree. It has some other properties.
 Every node in B-Tree will hold maximum m children
 Every node except root and leaves, can hold at least m/2 children
 The root nodes must have at least two children.
Internal nodes −

 Internal (non-leaf) nodes contain at least ⌈n/2⌉ pointers, except the root node.
 At most, an internal node can contain n pointers.
Leaf nodes −

 Leaf nodes contain at least ⌈n/2⌉ record pointers and ⌈n/2⌉ key values.
 At most, a leaf node can contain n record pointers and n key values.
 Every leaf node contains one block pointer P to point to next leaf node and forms a
linked list.
B+ Tree Insertion
 B+ trees are filled from bottom and each entry is done at the leaf node.
 If a leaf node overflows −
o Split node into two parts.
o Partition at i = ⌊(m+1)/2⌋.
o First i entries are stored in one node.
o Rest of the entries (i+1 onwards) are moved to a new node.
o ith key is duplicated at the parent of the leaf.
 If a non-leaf node overflows −
o Split node into two parts.
o Partition the node at i = ⌈(m+1)/2⌉.
o Entries up to i are kept in one node.
o Rest of the entries are moved to a new node.

B+ Tree Deletion
 B+ tree entries are deleted at the leaf nodes.
 The target entry is searched and deleted.
o If it is an internal node, delete and replace with the entry from the left position.
 After deletion, underflow is tested,
o If underflow occurs, distribute the entries from the nodes left to it.
 If distribution is not possible from left, then
o Distribute from the nodes right to it.
 If distribution is not possible from left or from right, then
o Merge the node with left and right to it.

B+ Tree Insertion

Suppose we want to insert a record 60 in the below structure. It will go to the 3rd leaf node after
55. It is a balanced tree, and a leaf node of this tree is already full, so we cannot insert 60 there.

n this case, we have to split the leaf node, so that it can be inserted into tree without affecting the
fill factor, balance and order.

The 3rd leaf node has the values (50, 55, 60, 65, 70) and its current root node is 50. We will split
the leaf node of the tree in the middle so that its balance is not altered. So we can group (50, 55)
and (60, 65, 70) into 2 leaf nodes.

If these two has to be leaf nodes, the intermediate node cannot branch from 50. It should have 60
added to it, and then we can have pointers to a new leaf node.
This is how we can insert an entry when there is overflow. In a normal scenario, it is very easy to
find the node where it fits and then place it in that leaf node.

B+ Tree Deletion

Suppose we want to delete 60 from the above example. In this case, we have to remove 60 from
the intermediate node as well as from the 4th leaf node too. If we remove it from the
intermediate node, then the tree will not satisfy the rule of the B+ tree. So we need to modify it to
have a balanced tree.

After deleting node 60 from above B+ tree and re-arranging the nodes, it will show as follows:

You might also like