You are on page 1of 12

Understanding CRUD Operations on Tables with B-tree

Indexes, Page-splits, and Fragmentation

Introduction
Every DML transaction reads the data before it makes any changes. Not only during a
SELECT query, but when you run any DML statement, insert, update, or delete, SQL
Server first fetches a bunch of pages into the buffer pool locating the desired rows and
changes them while synchronously writing to the transaction log file.

While indexes on tables help improve read performance, they add overhead during
modifications to the data. Every insert, update or delete that affects the columns in the
base table, a heap or a clustered index, also occurs on any non-clustered indexes that
contain that column.

Before we dive into the DML commands, let us first understand the basic B-tree
structure.
In a b-tree structure, SQL Server sorts the data pages in the order of the index key
column(s). The pages are connected as doubly linked lists to allow for scanning the
pages sequentially either in the forward or reverse direction. Each index page header
contains the page addresses of the previous and the next page. A b-tree index has
three types of levels: root, intermediate, and leaf.

A root page sits at the top level with rows pointing to the pages below it. The next level
down from the root is an intermediate level consisting of index pages with each row
pointing to the page at the next level below it. There can be multiple intermediate levels
if a table is large. The last intermediate level points to the leaf level consisting of leaf
pages, which could be data pages if the b-tree in question is a clustered index and
index pages if it's a non-clustered index. If the leaf pages are index pages, they contain
pointers to the base table, which may be a heap or a clustered index.

Insert
When inserting a new row, SQL Server reads the data page on which it must insert it
into the buffer pool unless the page doesn't already exist. In this case SQL Server 
allocates a new extent. Then it inserts the row on the page, synchronously writes the
corresponding log record to the transaction log, and asynchronously flushes the dirty
page to the data file on disk via either checkpoint or lazy writer.

SQL Server knows which page to store the row on when inserting a row into a table with
a clustered index. For example, if there is a page with clustering key column values 1 to
10. When we insert a new row with the key column value 11, SQL Server tries to
accommodate it on the same page so long as there is enough space. If there is none,
SQL Server allocates a new page and links it to the previous page to maintain the order.
Suppose it must insert a row on a page with no room, say, a row with column value 5 in
our example. In that case, SQL Server splits the page, i.e., it allocates a new page and
moves close to 50% of the rows from the previous page to the newly allocated page and
performs the insert on either the old or new page depending on the index key value.

While this page-split process maintains the rows' logical ordering based on the
clustering key column, it is costly. SQL Server has to unlink the old page, create a new
link to the newly allocated page, and move the rows there. Also, an entry for the new
page is added to the parent page above it. The operation is fully logged, resulting in
increased transaction logging, which could have dire consequences if the database is
part of a feature that relies on transaction log such as Availability Groups.

Page splitting also causes logical (or external) fragmentation. If the new page is not
physically contiguous to the page being split in a page-split, which is almost always the
case, it affects performance. Especially during read-ahead reads, a performance
optimization mechanism where SQL Server, in anticipation, reads a bunch of extra
pages into memory before they are even needed by issuing large I/O requests. By not
having the neighboring pages physically contiguous because of page-splits, SQL Server
has to move back and forth when reading them. The rows' logical order doesn't follow
the physical order, thus causing logical fragmentation that is detrimental to
performance. Page-splits can happen in the index root page, intermediate index pages,
leaf-level data pages of a clustered index, or leaf-level index pages of a non-clustered
index.

A page can undergo multiple splits if the new row cannot be accommodated after a
single split. Also, a page split executes as an internal transaction, meaning if you ran an
insert transaction and it caused a page to split and cancel it, while SQL Server rolls
back the insert, it doesn't undo the split. Also, SQL Server exclusively locks the page
being split when it splits between two pages, meaning the rows on the page are not
accessible to sessions needing them.

Fragmentation in clustered indexes often occurs when the clustering key is not
increasing, meaning if the key column values are randomly generated, there is no
guarantee that new values will be before or after the existing values. For example,
suppose a clustered index uses a uniqueidentifier data type with NEWID() function to
generate values randomly. Although the values will be unique, they can fall before or
after the existing values, leading to page-splits and cause fragmentation. The same
explanation holds for non-clustered indexes.

Index Root Page-split

A splitting of the index root page is not common as indexes usually have a few levels,
and for the root page to split, the table must be huge with bulk inserts. If the index root
page must split to accommodate a new index row, SQL Server allocates two new index
pages. It moves the rows from the root page onto the two new pages, followed by the
new row's insert into one of the two new pages based on its logical order. SQL Server
does not remove the root page. The original rows on the root page now exist on the two
new pages, and the root page now has two new index rows, each pointing to one of the
two new index pages. This splitting of the root page introduces a new intermediate level
in the index, i.e., the two new index pages.

Intermediate Index Page-split

Suppose the intermediate-level index page must split to accommodate a new index row.
In that case, SQL Server allocates a new index page, locates the middle of the index
rows on the page-to-split, and moves the bottom 50% of the rows into the new page
followed by inserting the new row into either the page with top 50% or the new page
with bottom 50% rows depending on its logical order and linking the new page
appropriately. The corresponding parent page above has a new index row inserted with
the physical address and the minimum key column value on the newly allocated page.

Leaf Page Split


By far, the most common page split is the one that occurs at the leaf level of either
clustered or non-clustered index. Like in an intermediate page-split, SQL Server creates
a blank new page and moves 50% of the rows from the old to the new page and inserts
the new row into one of the two pages. While any non-clustered indexes on the table
are not affected because the clustered index key values don't change in a page split, it
affects key lookups' performance because SQL has to perform scattered reads if the
clustered index leaf pages are physically noncontiguous.

Insert Demo

Let us use the same database we created in the heap demo article to see how inserts
can cause fragmentation. The following code creates a new table called Insert_frag with
a uniqueidentifier column using NEWID() function as the clustering key and populates it
with a bunch of rows. We then insert some more records to cause random inserts,
resulting in page-splits and fragmentation.

--Fragmentation because of insert


USE DML_DB
GO
IF NOT EXISTS (
SELECT *
FROM sysobjects
WHERE name = 'Insert_frag'
AND xtype = 'U'
)
BEGIN
CREATE TABLE dbo.Insert_frag (
ID UNIQUEIDENTIFIER CONSTRAINT DF_GUID DEFAULT NEWID()
,Name VARCHAR(7800)
);
END
--insert a bunch of rows
INSERT INTO Insert_frag (Name)
SELECT TEXT
FROM sys.messages
CREATE CLUSTERED INDEX CIX_Insert_frag ON dbo.Insert_frag (ID);
--check fragmentation
USE DML_DB
GO
SELECT index_type_desc
,page_count
,CAST(avg_fragmentation_in_percent AS DECIMAL(6, 2)) AS
avg_frag_in_percent
,fragment_count AS frag_count
,avg_fragment_size_in_pages AS avg_frag_size_in_pages
,CAST(avg_page_space_used_in_percent AS DECIMAL(6, 2)) AS
avg_page_space_used_in_percent
,forwarded_record_count
FROM sys.dm_db_index_physical_stats(DB_ID('DML_DB'),
OBJECT_ID('dbo.Insert_frag'), NULL, NULL, 'DETAILED')
--Random inserts resulting in fragmentation
INSERT INTO Insert_frag (Name)
SELECT name
FROM sys.system_columns
--check fragmentation
SELECT index_type_desc
,page_count
,CAST(avg_fragmentation_in_percent AS DECIMAL(6, 2)) AS
avg_frag_in_percent
,fragment_count AS frag_count
,avg_fragment_size_in_pages AS avg_frag_size_in_pages
,CAST(avg_page_space_used_in_percent AS DECIMAL(6, 2)) AS
avg_page_space_used_in_percent
,forwarded_record_count
FROM sys.dm_db_index_physical_stats(DB_ID('DML_DB'),
OBJECT_ID('dbo.Insert_frag'), NULL, NULL, 'DETAILED')

The top outcome is from before the random inserts. Notice that fragmentation was
almost zero with high page density shown in column,
avg_page_space_used_in_percent column. After random insert, the bottom result
shows a lot of fragmentation and low page density (page fullness) because of page-
splits. Also, notice that the page count increased after the second insert.
Delete Operations
SQL Server performs deletes efficiently. As in any DML operation, SQL Server reads
into memory the data pages from which to delete the data rows. SQL Server doesn't
remove the rows from the slots of the page physically. Instead, it merely flags the
deleted rows as ghosted records by changing the row headers' status bits on those
affected pages. This makes the rollback easier, as it just has to revert the status bits on
those pages, making no physical changes.

SQL Server does the actual removal of the ghost records in the background with a
background thread, called ghost-cleanup. This thread removes the ghost records as
long as any active uncommitted transaction does not require them. The thread also
deallocates the entire page if we remove all the rows from it; however, heap tables have
to be rebuilt to deallocate empty pages. When we delete a row from a table, SQL Server
deletes the corresponding row from any non-clustered indexes since they contain a
pointer to the deleted row.

Deletes result in gaps in the otherwise physically contiguous pages. As an example, say
page 1 has rows with values 1 to 10, page 2 has values 11 to 20, and page 3 has
values 21 to 30. If we delete all the rows from page 2, the page becomes blank and
causes a gap between pages 1 and 3. Page 2 can either be used by future inserts,
updates or deallocated by the background ghost-cleanup thread. So, while deletes do
not cause page splits directly, they create gaps other indexes may use in the future,
potentially resulting in fragmentation, but it isn't as severe as the one introduced by
inserts and updates.

Delete Demo

 The following code creates a new table called Delete_frag with an identity column as
the clustered index and populates it with a bunch of rows. We then delete some random
records to cause gaps resulting in small fragmentation.

--Fragmentation because of delete


USE DML_DB
GO
IF NOT EXISTS (
SELECT *
FROM sysobjects
WHERE name = 'Delete_frag'
AND xtype = 'U'
)
BEGIN
CREATE TABLE dbo.Delete_frag (
ID INT IDENTITY(1, 1)
,Name VARCHAR(3000)
);
END
--insert a bunch of rows
INSERT INTO dbo.Delete_frag (Name)
SELECT name + replicate('a', 2500)
FROM sys.all_views
CREATE CLUSTERED INDEX CIX_Delete_frag ON dbo.Delete_frag (ID);
SELECT index_type_desc
,page_count
,CAST(avg_fragmentation_in_percent AS DECIMAL(6, 2)) AS
avg_frag_in_percent
,fragment_count AS frag_count
,avg_fragment_size_in_pages AS avg_frag_size_in_pages
,CAST(avg_page_space_used_in_percent AS DECIMAL(6, 2)) AS
avg_page_space_used_in_percent
,forwarded_record_count
FROM sys.dm_db_index_physical_stats(DB_ID('DML_DB'),
OBJECT_ID('dbo.Delete_frag'), NULL, NULL, 'DETAILED')
USE DML_DB
GO
--delete some random set of rows to create gaps
--deletes result in page gaps, so fragmentation is usually minimal
DELETE
FROM dbo.Delete_frag
WHERE ID BETWEEN 25
AND 75
DELETE
FROM dbo.Delete_frag
WHERE ID BETWEEN 85
AND 135
DELETE
FROM dbo.Delete_frag
WHERE ID BETWEEN 135
AND 235
waitfor delay '00:00:05'
SELECT index_type_desc
,page_count
,CAST(avg_fragmentation_in_percent AS DECIMAL(6, 2)) AS
avg_frag_in_percent
,fragment_count AS frag_count
,avg_fragment_size_in_pages AS avg_frag_size_in_pages
,CAST(avg_page_space_used_in_percent AS DECIMAL(6, 2)) AS
avg_page_space_used_in_percent
,forwarded_record_count
FROM sys.dm_db_index_physical_stats(DB_ID('DML_DB'),
OBJECT_ID('dbo.Delete_frag'), NULL, NULL, 'DETAILED')

Update Operations
Unless the page containing the row to be updated is already in the buffer pool, SQL
Server physically reads the page from disk, causing physical I/O. It then updates the
row on the page and generates a new transaction log record, which holds enough
information to redo the update if needed during recovery. It then synchronously writes
the log record to the transaction log file on disk. The changed or 'dirty' page is flushed to
the data file on disk at some point.

As long as the clustering key columns do not change, no non-clustered indexes are
affected. However, if the clustering key changes because of an update command, the
corresponding row in the non-clustered index is updated to reflect the new key value. If
an update causes the row to be moved to a different physical location and the clustering
key value remains intact, there is no change to the non-clustered index.

An update results in page splits and index fragmentation when we update a column to a
larger value that can neither replace the old value within the same location nor move to
another location on the same page, leaving page-split as the only option. Fragmentation
can also result when the index key value changes. The new value location is on a
different page with not enough space to accommodate the new row resulting in that
page splitting.

Update demo

The following code creates a new table called Update_frag with an identity column as
the clustered index and populates it with a bunch of rows. We then update some
records to a larger value, causing fragmentation.

--Fragmentation because of update


USE DML_DB
GO
IF NOT EXISTS (
SELECT *
FROM sysobjects
WHERE name = 'Update_frag'
AND xtype = 'U'
)
BEGIN
CREATE TABLE dbo.Update_frag (
ID INT IDENTITY(1, 1)
,Name VARCHAR(3000)
);
END
--insert a bunch of rows
INSERT INTO dbo.Update_frag (Name)
SELECT name + replicate('a', 2500)
FROM sys.all_views
CREATE CLUSTERED INDEX CIX_Update_frag ON dbo.Update_frag (ID);
USE DML_DB
GO
SELECT index_type_desc
,page_count
,CAST(avg_fragmentation_in_percent AS DECIMAL(6, 2)) AS
avg_frag_in_percent
,fragment_count AS frag_count
,avg_fragment_size_in_pages AS avg_frag_size_in_pages
,CAST(avg_page_space_used_in_percent AS DECIMAL(6, 2)) AS
avg_page_space_used_in_percent
,forwarded_record_count
FROM sys.dm_db_index_physical_stats(DB_ID('DML_DB'),
OBJECT_ID('dbo.Update_frag'), NULL, NULL, 'DETAILED')
USE DML_DB
GO
-- update a bunch of rows to a larger value to cause page-splits
UPDATE dbo.Update_frag
SET name = REPLICATE('b', 3000)
WHERE ID > 120
--notice there are no forwarding pointers because of clustered index
SELECT index_type_desc
,page_count
,CAST(avg_fragmentation_in_percent AS DECIMAL(6, 2)) AS
avg_frag_in_percent
,fragment_count AS frag_count
,avg_fragment_size_in_pages AS avg_frag_size_in_pages
,CAST(avg_page_space_used_in_percent AS DECIMAL(6, 2)) AS
avg_page_space_used_in_percent
,forwarded_record_count
FROM sys.dm_db_index_physical_stats(DB_ID('DML_DB'),
OBJECT_ID('dbo.Update_frag'), NULL, NULL, 'DETAILED')

The top outcome is from before the update. Notice fragmentation was zero with high
page density. The bottom result is after the update, which shows fragmentation and low
page density because of page-splits.
Controlling Page-splits and Fragmentation
Page-splits result in updates to multiple pages, which are all fully logged in the
transaction log. The next obvious question is, how do we avoid page splits? One way is
to use an appropriate clustering key that does not insert data into random pages but
does so at the end of the table, thus avoiding inserts into the earlier pages, which would
otherwise cause page-splits.

The recommended characteristics for a clustering key are unique, static, narrow, and
ever-increasing. GUIDs cause pages to split because of their random nature, so they
may not be the best choice for a clustering key. Instead, you may use an ever-
increasing IDENTITY or a DATETIME column. However, with an ever-increasing key,
the last page might become a hot spot resulting in the last page insert contention,
meaning there could be multiple concurrent inserts on the last page in memory, causing
high PAGELATCH_xx waits for the threads. SQL Server 2019 introduced an index
option called OPTIMIZE_FOR_SEQUENTIAL_KEY to improve throughput for high
concurrent inserts into the index, thus lowering the last page insert contention.

Second, use the appropriate value for Fill Factor when creating or rebuilding an index.
The Fill Factor determines how much free space is reserved for potential future inserts
on an index's leaf pages. The correct value for Fill Factor is driven by the type of
workload that runs on the instance. Suppose there is a lot of free space on the pages. In
that case, it leads to low page-density (aka internal fragmentation) and wastage of
memory as SQL Server reads the entire page into memory regardless of whether it is
wholly or partially filled. Likewise, if you choose to fill the pages to their capacity, there is
a high likelihood of page-splits and fragmentation. So there needs to be a careful
evaluation of the workload.

A Fill Factor of 0 is the same as 100, meaning to fill the pages to their capacity. 0 is the
default fill factor value when creating an index and may be changed to suit the
workload. One way is to start with 100, monitor the fragmentation levels, and gradually
adjust the value down until you find an optimal value.

Avoiding variable-length data types also minimizes fragmentation. Data in data types
such as VARCHAR and NVARCHAR can change over time and potentially cause page
splits. Row versioning can also introduce fragmentation. When row versioning is
enabled, SQL Server stores the old version of the records in tempdb's version store and
adds a 14-byte version tag to the rows in the database to link to the rows in the version
store. This 14-byte increase in each row might force rows to move around, causing
page-splits.
Shrinking a database also introduces fragmentation, which we can fix by rebuilding the
index. Shrinking moves the pages from the end of the data file to the front and releases
the free space at the end. SQL Server moves the pages to the front without following
the physical contiguity, introducing fragmentation. Therefore, we should avoid shrinking
unless necessary.

We must follow proper index reorganize and rebuild strategies. Microsoft recommends
rebuilding an index if the fragmentation exceeds 30 percent and reorganize if it's
between 5 and 30 percent. While index rebuild runs as a transaction that, when stopped
rolls back completely, index reorganize can be interrupted without losing the work
completed up to the point where it was stopped. SQL Server 2017 introduced a
resumable online index rebuild feature that allows pausing an index rebuild either
manually or when there is a failure and resuming it afterward.

Conclusion
Indexes must be maintained regularly to remove internal and external fragmentation.
We must carefully evaluate the Fill Factor, as there is no universally accepted value for
it. Choose a unique, static, narrow, and ever-increasing column for a clustered index to
minimize page-splits and fragmentation. Non-clustered indexes should be narrow to
increase page density leading to faster index maintenance, efficient buffer pool
utilization, fewer I/Os, faster backups, and faster DBCC CHECKDB. Inserts, updates,
and deletes can all lead to fragmentation. However, deletes technically cause gaps
between pages which may eventually lead to fragmentation, but it's not as significant as
the fragmentation caused by inserts and updates.

 Understanding the Internals of a Data Page


 Understanding CRUD Operations Against Heaps and Forwarding Pointers
 Understanding CRUD Operations on Tables with B-tree Indexes, Page-splits,
and Fragmentation (this article)

You might also like