You are on page 1of 42

How SQL Server Indexes Work

Sharon F. Dooley sharond@voicenet.com

SQL Server Indexes


SQL Server indexes are based on B-trees Special records called nodes that allow keyed access to data Two kinds of nodes are special Root Leaf Root node A O

Intermediate node

Leaf node

Data pages

SQL Server B-Tree Rules


Root and intermediate nodes point only to other nodes Only leaf nodes point to data The number of nodes between the root and any leaf is the same for all leaves A node always contains between K and K/2 branches, where K is the branching factor Branching factor is the number of keys in the node B-trees are always sorted The tree will be maintained during insertion, deletion, and updating so that these rules are met When records are inserted or updated, nodes may split

When records are deleted, nodes may be collapsed

What Is a Node?
A page that contains key and pointer pairs
Key Key Key Key Key Key Key Key Pointer Pointer Pointer Pointer Pointer Pointer Pointer Pointer

Splitting a B-Tree Node


Abby Bob Carol Dave

Root (Level 0)

Abby

Ada

Andy

Ann

Node (Level 1)

Ada

Alan

Amanda

Amy

Leaf (Level 2)

Bob

Alan

Amanda

Carol

Amy

Dave

Ada

DB

Lets Add Alice


Step 1: Split the leaf node
Ada Alan Alice Amanda Amy

Bob

Alan

Amanda

Carol

Amy

Dave

Ada

Alice

Adding Alice
Step 2: Split the next level up
Abby Ada Amanda Andy Ann

Ada

Alan

Alice

Amanda

Amy

Leaf

Bob

Alan

Amanda

Carol

Amy

Dave

Ada

Alice

DB

Split the root


Abby Andy

Adding Alice (continued)


Bob Carol Dave

Abby

Ada

Amanda

Andy

Ann

Ada

Alan

Alice

Amanda

Amy

Leaf

Bob

Alan

Amanda

Carol

Amy

Dave

Ada

Alice

DB

When the root splits, the tree grows another level


Abby Carol

Adding Alice (continued)

Root (Level 0)
Carol Dave

Abby

Andy

Bob

Node (Level 1) Node (Level 2)


Amy

Abby

Ada

Amanda

Andy

Ann

Ada

Alan

Alice

Amanda

Leaf (Level 3)
Ada Alice

Bob

Alan

Amanda

Carol

Amy

Dave

DB

Page splits cause fragmentation


Two types of fragmentation

Data pages in a clustered table Index pages in all indexes


Fragmentation happens because these pages must be kept in order Data page fragmentation happens when a new record must be added to a page that is full Consider an Employee table with a clustered index on LastName, FirstName Extent

Adams, Carol Ally, Kent Baccus, Mary

David, Sue Dulles, Kelly Edom, Mike

Farly, Lee Frank, Joe

...

Ollen, Carol Oppus, Larry

Data Page Fragmentation


Adams, Carol Ally, Kent Baccus, Mary David, Sue Dent, Peter Farly, Lee Frank, Joe Ollen, Carol Oppus, Larry

Extent

...

Dulles, Kelly Edom, Mike

...

Extent

Index Fragmentation
Index page fragmentation occurs when a new key-pointer pair must be added to an index page that is full Consider an Employee table with a nonclustered index on Social Security Number

036-11-9987, pointer 036-33-9874, pointer 038-87-8373, pointer

046-11-9987, pointer 048-33-9874, pointer 052-87-8373, pointer

116-11-9987, pointer 116-33-9874, pointer

...

124-11-9987, pointer 124-33-9874, pointer 125-87-8373, pointer

Extent

Employee 048-12-9875 is added

Index Fragmentation (continued)


036-11-9987, pointer 036-33-9874, pointer 038-87-8373, pointer 046-11-9987, pointer 048-12-9875, pointer 116-11-9987, pointer 116-33-9874, pointer ... 124-11-9987, pointer 124-33-9874, pointer 125-87-8373, pointer

Extent

048-33-9874, pointer 052-87-8373, pointer

...

Extent

Studying Fragmentation in SQL Server 2000


To determine if there is fragmentation In a clustered table or a nonclustered index DBCC SHOWCONTIG [([table_id | table_name | view_id | view_name [, index_id | index_name])]
DBCC SHOWCONTIG scanning 'Employees' table... Table: 'Employees' (1977058079); index ID: 1, database ID: 7 TABLE level scan performed. - Pages Scanned................................: 707 - Extents Scanned..............................: 90 - Extent Switches..............................: 572 - Avg. Pages per Extent........................: 7.9 - Scan Density [Best Count:Actual Count].......: 15.53% [89:573] - Logical Scan Fragmentation ..................: 39.18% - Extent Scan Fragmentation ...................: 58.89% - Avg. Bytes Free per Page.....................: 4338.9 - Avg. Page Density (full).....................: 46.39% DBCC execution completed. If DBCC printed error messages, contact your system administrator.

Studying Fragmentation in SQL Server 2000 (continued)


Unless the table or index spans multiple files Extent Switches and Extents Scanned should be approximately equal Scan Density should be close to 100 percent Avg. Page Density should be high and Avg. Bytes Free Per Page should be low Logical Scan Fragmentation and Extent Scan Fragmentation should be as close to 0 as possible Clearly the Employees table is terribly fragmented!

Studying Fragmentation in SQL Server 2005 and 2008


SELECT object_name(s.object_id), name, avg_fragmentation_in_percent FROM sys.dm_db_index_physical_stats (db_id('databasename'), object_id('tablename'), {indexid | NULL}, {partitionnumber | NULL}, {'LIMITED' | 'SAMPLED' | 'DETAILED' | NULL}) as s INNER JOIN sys.indexes as i ON s.object_id = i.object_id AND s.index_id = i.index_id If NULL is supplied for the last argument, LIMITED is assumed The avg_fragmentation_in_percent should be as close to 0 as possible

Studying Fragmentation in SQL Server 2005 and 2008 (continued)


SELECT object_name(s.object_id), name, avg_fragmentation_in_percent FROM sys.dm_db_index_physical_stats (db_id('bigwind'), object_id('Employees'), null, null, null) as s INNER JOIN sys.indexes as i ON s.object_id = i.object_id AND s.index_id = i.index_id WHERE alloc_unit_type_desc = 'IN_ROW_DATA'

Results
Employees LastName_IDx 0.685

Employees
Employees Employees

PK_Employees
City_IDX Region_IDX

3.0303
0 3.922

Repairing Fragmentation
Repair index fragmentation by rebuilding index Rebuilding clustered index repairs table fragmentation DBCC DBREINDEX DBCC DBREINDEX (tablename [, indexname [, fillfactor]]) Can reorganize indexes that implement primary key and unique constraints CREATE INDEX DROP_EXISTING causes SQL Server to create and drop the index in a single step Faster than dropping with the DROP INDEX command and then re-creating ALTER TABLE ADD CONSTRAINT PRIMARY KEY or UNIQUE

Repairing Fragmentation (continued)


DBCC INDEXDEFRAG does not Lock the index Do as thorough job as the other methods Allow specification of a fill factor Uses the fill factor from the last CREATE INDEX for this index DBCC INDEXDEFRAG ( { database_name | database_id | 0 } , { table_name | table_id} , { index_name | index_id } ) 2008 ALTER INDEX index_name ON table_name REORGANIZE 2005 Same as DBCC INDEXDEFRAG ALTER INDEX index_name ON table_name REBUILD Allows concurrent access if you add WITH (ONLINE = ON) to the ALTER INDEX command Uses the version store in tempdb Same as DBCC DBREINDEX

SQL Server Indexes


SQL Server indexes come in two flavors Clustered indexes Database rows are in order on the index key The data pages are the leaf nodes of the index Nonclustered indexes Leaf level is in index order but the data is not Leaf nodes contain pointers to rows

One clustered index per table Choose wisely Should always have a clustered index Allows reorganization of the data pages
249 nonclustered indexes per table

Clustered Index
Root
Abby Bob Carol Dave

Abby

Ada

Andy

Ann

Ada

Alan

Amanda

Amy

Database and leaf node

Nonclustered Index
Root
Abby Bob Carol Dave

Abby

Ada

Andy

Ann

Ada

Alan

Amanda

Amy

Leaf node

Amy

Ada

Amanda

Alan

Database

Clustered indexes are always unique If you dont specify unique when creating them, SQL Server may add a uniqueifier to the index key Only used when there actually is a duplicate Adds 4 bytes to the key The clustering key is used in nonclustered indexes This allows SQL Server to go directly to the record from the nonclustered index If there is no clustered index, a record identifier will be used instead Leaf node of a nonclustered index on LastName
Adams Douglas 3 4

Clustered and Nonclustered Indexes Interact

Leaf node of a clustered index on EmployeeID


1
2 3 4

Jones
Smith Adams Douglas

John
Mary Mark Susan

Jones
Smith

1
2

Clustered and Nonclustered Indexes Interact (continued)


Another reason to keep the clustering key small! Consider the following query: SELECT LastName, FirstName FROM Employee WHERE LastName = 'Douglas' When SQL Server uses the nonclustered index, it Traverses the nonclustered index until it finds the desired key Picks up the associated clustering key Traverses the clustered index to find the data

Heaps and Chains


When you place a clustered index on a table, the pages are chained together in a doubly linked list

SQL Server can follow the pointers to move from page to page When there is no clustered index, the table is called a heap Data is located Through nonclustered indexes By scanning all the pages in the table

Indexes and Inserts


When there is no clustered index on a table, SQL Server uses the Page Free Space page to find a page with space for the new record Inserts into tables with a clustered index can cause page splits or hotspots When a particular part of the database is particularly popular, it is called a hotspot Hotspots create contention problems Clustered indexes on keys that arrive in random order cause page splits Clustered indexes on keys that arrive in index order create hotspots All inserts are again at the end of the table Identities Dates However, there will be no page splits or collapses

Indexes and Updates


When data is modified, indexes may have to be modified as well Changing data in a table with no indexes Data will be changed in place unless the update means that the row will no longer fit on the page If the row wont fit, it will simply be moved to a new page Changing a clustering key column The row will be deleted from its original location It will be inserted into the new location All nonclustered indexes must be maintained Exception: A change that doesnt affect the index order Thomas Thompson Tyne Changing Thompson to Thompsen will be done in place Avoid clustered indexes on columns that frequently change

Indexes and Updates (continued)


Changing any non-key column in a table with only nonclustered indexes Data will be changed in place unless the row will no longer fit If the row will no longer fit It will be moved to another page A forwarding pointer will be left on the original page that points to the new location Index pointers dont need to be updated An additional I/O will now be required There will never be more than one forwarding pointer, no matter how many times a row moves To see whether updates have produced forwarding pointers, use 2000 DBCC SHOWCONTIG('tablename') WITH TABLERESULTS SELECT forwarded_record_count FROM sys.dm_db_index_physical_stats (db_id('databasename'), 2005 2008 object_id('tablename'), NULL, NULL, NULL)

Indexes and Deletes


Deleting from a heap Row is physically deleted Remaining rows on page below the deleted record are not moved up at the time the delete happens Page will be compressed when space is needed for another row on the page Deleting from a leaf node Data pages of clustered index Leaf nodes of nonclustered index Records may not be physically deleted at the time the delete is issued Records may be marked as ghost records Used by lock manager Not retrievable by users Special SQL Server process cleans up the ghost records Wont clean up records that are part of an active transaction Doesnt compress page

Deleting from a nonleaf node No ghost records Page is not compressed When rows are deleted, both nonclustered and clustered indexes must be maintained When the last row is deleted from a page (index or data), the page is deallocated and returned to the free space pool Unless it is the only page in the table A table always has at least one page, even if it is empty

Indexes and Deletes (continued)

What Are Index Statistics?


Metric used by the optimizer in determining whether or not an index is useful for a particular query Stored in
An image column named statblob in the sysindexes table 2008 An internal and invisible table 2005
2000

Essentially a histogram Statistics are kept for 200 steps

Selectivity

The statistics allow the optimizer to determine the selectivity of an index A unique, single-column index always has a selectivity of 1 One index entry points to exactly one row Another term for this is density Density is the inverse of selectivity Density values range from 0 to 1 A selective index has a density of 0.10 or less A unique, single-column index always has a density of 0.0 When the index is composite, it becomes a little more complicated SQL Server maintains detailed statistics only on the leftmost column It does compute density for each column Assume there is an index on (col1, col2, col3) Density is computed for Col1 Col1 + Col2 Col1 + Col2 + Col3

Exploring Statistics
To see the index statistics, use DBCC SHOW_STATISTICS ('tablename', {'indexname' | 'statisticsname'}) DBCC SHOW_STATISTICS ('Employees', 'EmployeeName_Idx') Interpreting the step output
RANGE_HI_KEY RANGE_ROWS EQ_ROWS Upper-bound value of a histogram step Number of rows from the sample that fall within a histogram step, excluding the upper bound Number of rows from the sample that are equal in value to the upper bound of the histogram step

DISTINCT_RANGE_ROWS Number of distinct values within a histogram step, excluding the upper bound AVG_RANGE_ROWS Average number of duplicate values within a histogram step, excluding the upper bound

Statistics for INDEX 'EmployeeName_IDX'. Updated Rows

Exploring Index Statistics (continued)


Rows Sampled Steps Density

Avg key length -------------------- ------- ------------- ------ ------------ -----------Jan 27 2002 7:00PM 10009 10009 200 1.3958309E-4 28.475372 All density -----------1.4271443E-4 9.9940036E-5 9.9910081E-5 RANGE_HI_KEY -----------Aaby Abrahamson . . . Zuran Zvonek Average Length -------------13.252672 24.475372 28.475372 RANGE_ROWS ---------0.0 59.0 13.0 0.0 Columns -------LastName LastName, FirstName LastName, FirstName, EmployeeID DISTINCT_RANGE_ROWS -------------------0 43 10 0 The report for SQL Server 2005 and later is slightly different at the beginning

EQ_ROWS ------2.0 2.0 1.0 1.0

AVG_RANGE_ROWS -------------0.0 1.3409091 1.3 0.0

Distribution steps

Statistics Maintenance
By default, SQL Server will automatically maintain the statistics Index statistics are computed (or recomputed) when the index is created or rebuilt SQL Server keeps track of the updates to a table Each INSERT, UPDATE, or DELETE statement updates a counter in sysindexes named rowmodctr Note that TRUNCATE TABLE does not modify this counter Whenever the statistics are recomputed, the counter is set back to zero When you issue a query, the optimizer checks rowmodctr to see whether the statistics are up to date If they are not, the statistics will be updated

Statistics Maintenance (continued)time in a production system Note that this may not always happen at the best
Can turn off automatic update Can manually update In SQL Server 2005 and later, can set the AUTO_UPDATE_STATISTICS_ASYNC database option
Table type Empty condition Threshold when empty Number of changes >= 500 Number of changes >= 6 Threshold when not empty Number of changes >= 500 + (Number of rows * 20%) Number of changes >= 500

Permanent < 500 rows Temporary < 6 rows

Example: Assume that a table has 1,000 rows The threshold would be 500 + (.20 * 1000) You would expect to see the statistics automatically updated after about 700 modifications

Estimating Page Accesses


No index Number of data pages in the table Equality query using a unique index Nonclustered index Number of index levels + 1 (if there is no clustered index) Number of nonclustered index levels + the number of levels in the clustered index Clustered index Number of index levels

Number of levels + number of data pages


Root
Abby Bob Carol Dave

Estimating Page Accesses for a Clustered Index

Number of data pages = number of qualifying rows / rows per page

Abby

Ada

Andy

Ann

Ada

Alan

Amanda

Amy

Database and leaf node

Estimating Page Accesses for a Nonclustered Index rows or Number of levels + number of qualifying leaf pages + number of
number of levels + number of qualifying leaf pages + (number of rows * number of clustered index levels) Number of qualifying leaf pages = number of Abby Bob Carol Dave Root qualifying rows / rows per page Assumes every row is on a different page
Abby Ada Andy Ann

Ada

Alan

Amanda

Amy

Leaf node

Amy

Ada

Amanda

Alan

Database

Covering Indexes
When a nonclustered index includes all the data requested in a query (both the items in the SELECT list and the WHERE clause), it is called a covering index With a covering index, there is no need to access the actual data pages Only the leaf nodes of the nonclustered index are accessed Leaf node of a nonclustered index on LastName, FirstName, Birthdate
Adams Douglas Mark Susan 1/14/1956 12/12/1947 3 4

Jones
Smith

John
Mary

4/15/1967
7/14/1970

1
2

The last column is EmployeeID. Remember that the clustering key is always included in a nonclustered index.

Because the leaf node of a clustered index is the data itself, a clustered index covers all queries

Covering Indexes (continued)


This index covers the following queries: SELECT EmployeeID, LastName, FirstName, Birthdate FROM Employees WHERE Birthdate >= '1/12/1941' SELECT LastName FROM Employees WHERE EmployeeID = 7 SELECT LastName, FirstName FROM Employees WHERE LastName BETWEEN 'A' AND 'C' Remember that the number of accesses for a nonclustered index is the number of levels + the number of qualifying leaf pages + either the number of rows or (the number of rows * the number of levels in the clustered index) A covering index eliminates the number of rows term from the equation The optimizer is highly likely to use a nonclustered index because of this

Non-Key Index Columns


SQL Server 2005 and later allow you to include columns in a non-clustered index that are not part of the key Allows the index to cover more queries Included columns only appear in the leaf level of the index Up to 1,023 additional columns Can include data types that cannot be key columns Except text, ntext, and image data types Syntax CREATE [ UNIQUE ] NONCLUSTERED INDEX index_name ON <object> ( column [ ASC | DESC ] [ ,...n ] ) [ INCLUDE ( column_name [ ,...n ] ) ] Example CREATE NONCLUSTERED INDEX NameRegion_IDX ON Employees(LastName) INCLUDE (Region)