One of the most important routes to high performance in a SQL Server database is the index.

Indexes speed up the querying process by providing swift access to rows in the data tables, similarly to the way a book¶s index helps you find information quickly within that book. In this article, I provide an overview of SQL Server indexes and explain how they¶re defined within a database and how they can make the querying process faster. Most of this information applies to indexes in both SQL Server 2005 and 2008; the basic structure has changed little from one version to the next. In fact, much of the information also applies to SQL Server 2000. This does not mean there haven¶t been changes. New functionality has been added with each successive version; however, the underlying structures have remained relatively the same. So for the sake of brevity, I stick with 2005 and 2008 and point out where there are differences in those two versions.

Index Structures
Indexes are created on columns in tables or views. The index provides a fast way to look up data based on the values within those columns. For example, if you create an index on the primary key and then search for a row of data based on one of the primary key values, SQL Server first finds that value in the index, and then uses the index to quickly locate the entire row of data. Without the index, a table scan would have to be performed in order to locate the row, which can have a significant effect on performance. You can create indexes on most columns in a table or a view. The exceptions are primarily those columns configured with large object (LOB) data types, such as image, text, and varchar(max). You can also create indexes on XML columns, but those indexes are slightly different from the basic index and are beyond the scope of this article. Instead, I'll focus on those indexes that are implemented most commonly in a SQL Server database. An index is made up of a set of pages (index nodes) that are organized in a B-tree structure. This structure is hierarchical in nature, with the root node at the top of the hierarchy and the leaf nodes at the bottom, as shown in Figure 1.

Note: A table that has a clustered index is referred to as a clustered table. The query engine would then determine that it must go to the third page at the next intermediate level. and the second page. with each layer of the intermediate level more granular than the one above. A table can have only one clustered index . Each entry in the directory represents one row of the table. The query engine continues down through the index nodes until it reaches the leaf node. Clustered Indexes A clustered index stores the actual data rows at the leaf level of the index. the query engine would first look in the root level to determine which page to reference in the top intermediate level. the first page points the values 1-100. so the query engine would go to the second page on that level. that would mean that the entire row of data associated with the primary key value of 123 would be stored in that leaf node. In this example. Returning to the example above. A table that has no clustered index is referred to as a heap. As a result. An important characteristic of the clustered index is that the indexed values are sorted in either ascending or descending order. if you¶re searching for the value 123 in an indexed column. depending on whether the index is clustered or nonclustered. From there. there can be only one clustered index on a table or view. Example: A printed phone directory is a great example of a clustered index.Figure 1: B-tree structure of a SQL Server index When a query is issued against an indexed column. the values 101-200. For example. the query engine would navigate to the leaf node for value 123. In addition. The leaf node will contain either the entire row of data or a pointer to that row. the query engine starts at the root node and navigates down through the intermediate nodes. data in a table is sorted only if a clustered index has been defined on a table.

clustered index if a clustered index does not already exist on the table or view. you can include up to 16 columns in an index. This means that the query engine must take an additional step in order to locate the actual data. y A unique index is automatically created when you define a primary key or unique constraint:  Primary key: When you define a primary key constraint on one or more columns. the uniqueness is enforced across the columns as a whole. A row locator¶s structure depends on whether it points to a clustered table or to a heap. Both clustered and nonclustered indexes can be composite indexes. you can also add included columns to your index. if you were to create an index on the FirstName and LastName columns in a table. but the individual names can be duplicated. In addition to being able to create multiple nonclustered indexes on a table or view. You can specify that a unique clustered index be created if a clustered index does not already exist on the table. SQL Server automatically creates a unique. Unique: When you define a unique constraint. A non-clustered index has the indexed columns and a pointer or bookmark pointing to the actual row. For example. the names together must be unique.Nonclustered Indexes Unlike a clustered indexed. using the value from the clustered index to navigate to the correct data row. SQL Server 2005 supports up to 249 nonclustered indexes. In both SQL Server 2005 and 2008. However. as I explain later in the article. you can create more than one nonclustered index per table or view. In the case of our example it contains a page number. Indexes can both help and hinder performance. This certainly doesn¶t mean you should create that many indexes. it can be configured in other ways: y Composite index: An index that contains more than one column. not on the individual columns. as long as the index doesn¶t exceed the 900-byte limit. Unique Index: An index that ensures the uniqueness of each value in the indexed column. Index Types In addition to an index being clustered or nonclustered. the row locator points to the actual data row. SQL Server automatically creates a unique. For example. If referencing a heap. Another example could be a search done on Google or another of the search engines. The results on the page contain links to the original web pages. Example: The index in the back of a book is an example of a non-clustered index. nonclustered index on the primary key. Nonclustered indexes cannot be sorted like clustered indexes. This means that you can store at the leaf level not only the values from the indexed column. If the index is a composite. nonclustered index. This strategy allows you to get around some of the limitations on indexes. you can include non-indexed columns in order to exceed the size limit of indexed columns (900 bytes in most cases). rather than contain the data rows themselves. the leaf nodes of a nonclustered index contain only the values from the indexed columns and row locators that point to the actual data rows. and SQL Server 2008 support up to 999. but also the values from non-indexed columns. If referencing a clustered table. however. you can override the default behavior and define a unique.  . the row locator points to the clustered index.

the more poorly the index performs. you must also take into account whether and how much you¶re going to be inserting. Index Design As beneficial as indexes can be. implement unique indexes. the better the performance. You should consider the following guidelines when planning your indexing strategy: y y For tables that are heavily updated. which can lead to additional overhead and can affect performance. try to implement your clustered indexes on unique columns that do not permit null values. they must be designed carefully. you must take into account the frequency of data modifications. In addition. you should consider the following guidelines: . If a table contains a lot of data but data modifications are low. and deleting data. As mentioned above. see the topic ³Creating Indexes on Computed Columns´ in SQL Server Books Online. updating. you don¶t want to implement more indexes than necessary. although query considerations should also be taken into account when determining which columns should participate in the clustered index. On the other hand. use as many indexes as necessary to improve query performance. When possible. indexes can enhance performance because they can provide a quick way for the query engine to find data. However. Subsequent columns should be listed based on the uniqueness of their values. and don¶t over-index the tables. Database As mentioned above. In addition. use indexes judiciously on small tables because the query engine might take longer to navigate the index than to perform a table scan. take into consideration the order of the columns in the index definition. the more unique each value. The uniqueness of values in a column affects index performance. For more details about indexing computed columns. the indexes must also be modified to reflect the changed data. When you modify data. Ideally. This is why the primary key is often used for the table¶s clustered index. based on a value in the ContactID column. In general. your query might retrieve the FirstName and LastName columns from a table. the more duplicate values you have in a column. which can significantly affect performance. However. For clustered indexes. the expression used to generate the values must be deterministic (which means it always returns the same result for a specified set of inputs).y Covering index: A type of index that includes all the columns that are needed to process a particular query. You can also index computed columns if they meet certain requirements. For example. use as few columns as possible in the index. indexes are automatically updated when the data rows themselves are updated. As a result. y y y y Queries Another consideration when setting up indexes is how the database will be queried. index design should take into account a number of considerations. For composite indexes. For example. You can create a covering index that includes all three columns. Because they can take up significant disk space. try to keep the length of the indexed columns as short as possible. with the most unique listed first. Columns that will be used in comparison expressions in the WHERE clause (such as WHERE FirstName = 'Charlie') should be listed first.

Indexes aren·t free. while speeding up the selecting of data.y y y Try to insert or modify as many rows as possible in a single statement. When a page split occurs 2 new pages are created and the data is split between the newly created pages. how they should be partitioned. So I studied for (and passed) the Microsoft Exam 70-432. there is definitely a lot more to indexes than what I've presented here. and other important considerations. In addition. 'How does a SQL Server store index. The design and implementation of indexes are an important component of any SQL Server database design. rather than using multiple queries. If data is consistently inserted into the middle of a page. such as XML indexes as well as the filtered and spatial indexes supported in SQL Server 2008. This by no means is a complete picture of SQL Server indexing. then. There are always the same numbers of nodes on both the right and left hand sides of the tree.' Microsoft·s SQL Server uses a B-tree data structure to organize indexes. The main idea behind the B-tree structure is that it is always balanced. Whereas non-clustered indexes (think index in the back of a book ² the index tells you where to go look for the data) only contain the indexed columns and row locators to the actual row data and are unsorted. leaf nodes (also called data pages) at the bottom. should be seen as a starting point. In the meantime. SQL Server 2008 Implementation and Maintenance. As a result of learning more about SQL Server. A clustered index (think phone book ² all the data is present with the index entry) stores the entire row of data associated with the key value at the leaf level of the index. Hopefully you've found something new in this article or maybe it's inspired you to go research SQL Server indexes. but where those indexes should be stored. . each index you create. with root node at the top of the hierarchy. Consider indexing columns used in exact-match queries. These articles are by no means meant to be comprehensive. an index is only useful if it helps find data quickly in a table regardless of the amount of data in the table. or deleting of data because the database must update the indexes each time data changes. a way to familiarize yourself with the fundamental concepts of indexing. Index leaf nodes are referred to as data pages in SQL Server and each page can store up to 8kb of data. As data is inserted it is placed in its appropriate sorted location in the tree structure. Remember. updating. This allows the data structure to search quickly for data with a minimal number of disk reads. This article. they do come with a cost. Create nonclustered indexes on columns used frequently in your statement¶s predicates and join conditions. This structure is hierarchical. I¶ve tried to give you a basic overview of indexing in SQL Server and provide some of the guidelines that should be considered when implementing indexes. how data will be queried. Index Basics In this article. How the row data is store in each page depends on what type (clustered or non-clustered) index is created. and intermediate nodes in the middle. has a negative affect on the the inserting. Primarily. but more of a brief look into some facet of SQL Server. A table can only have one clustered index and clustered indexes are stored in either ascending or descending sort order. not only in terms of what should be indexed. I have decided to write a series of articles that might help other developers understand some of the basic concepts of SQL Server. HOW DOES SQL SERVER STORE INDEXES? Recently I felt pretty strongly that I needed to have a better understanding of how SQL Server worked. eventually all of the rows on the index page will no longer fit on the single page and a page split occurs. be sure to check out SQL Server Books Online for more information about the indexes described here as well as the other types of indexes. Rows on a page are stored in either ascending or descending sort order. Below is my first installment in this series. there are index types that I have not discussed.

The maximum number of children per node is the order of the tree. 64. . Depending on the number of records in the database. The binary tree at left has a depth of four. (The meaning of the letter B has not been explicitly defined. because a disk drive has mechanical parts. Clearly. This ensures that the B-tree functions optimally for the number of records it contains. the depth of a B-tree can and often does change. In a practical B-tree. are on hard diskrather than in random-access memory (RAM). and is called a null). compared with binary trees. thereby speeding up the process. 128. millions. splay tree. a record can be found by passing through fewer nodes than if there are two children per node. The number of required disk accesses is the depth. but at least half of them do. or billions of records. deleting a large enough number of records will decrease the depth. called nodes. there can be thousands. The difference in depth between binary-tree and B-tree schemes is greater in a practical database than in the example illustrated here. in which each node has only two children. Not all leaves necessarily contain a record. A simplified example of this principle is shown below. the B-tree allows a desired record to be located faster. there is nothing beyond them. assuming all other system parameters are identical. The image at right shows a B-tree of order three for locating a particular record in a set of eight leaves (the ninth leaf is unoccupied. M-tree. the B-tree at right has a depth of three. When there are many children per node. A sophisticated program is required to execute the operations in a B-tree. because real-world B-trees are of higher order (32. This name derives from the fact that records always exist at end points.B-tree Show me everything on Database Design and Modeling DEFINITION - A B-tree is a method of placing and locating files (called records or keys) in a database. so it runs fast. which read and write data far more slowly than purely electronic media. It takes thousands of times longer to access a data element from hard disk as compared with accessing it from RAM. But this program is stored in RAM. B-trees save time by using nodes with many branches (called children). Also see tree structure. The tradeoff is that the decision process at each node is more complicated in a B-tree as compared with a binary tree. The image at left shows a binary tree for locating a particular record in a set of eight leaves. B-trees are preferred when decision points. and X-tree. records are stored in locations called leaves. or more).) The B-tree algorithm minimizes the number of times a medium must be accessed to locate a desired record. Compare binary tree. In a tree. Adding a large enough number of records will increase the depth.

‡ There is no index entry corresponding to a row that has all key columns that are NULL. which stores number of columns and locking information ‡ Key column length-value pairs. .Structure of a B-Tree Index At the top of the index is the root. ‡ Updates to the key columns result in a logical delete and an insert to the index. Therefore a WHERE clause specifying NULL will always result in a full table scan. Format of Index Leaf Entries An index entry is made up of the following components: ‡ An entry header. which define the size of a column in the key followed by the value for the column (The number of such pairs is a maximum of the number of columns in the index. The leaf blocks are doubly linked to facilitate scanning the index in an ascending as well as descending order of key values. The space used by the deleted row is not available for new entries until all the entries in the block are deleted. ‡ Deleting a row results only in a logical deletion of the index entry. which contain the index entries that point to rows in the table. A new entry may be added to an index block even if it has less space than that specified by PCTFREE. At the next level are branch blocks. which contains the key values Index Leaf Entry Characteristics In a B-tree index on a nonpartitioned table: ‡ Key values are repeated if there are multiple rows that have same key value. since all rows belong to the same segment. Here is an explanation of the effect of a DML command on an index: ‡ Insert operations result in the insertion of index entry in appropriate block. ‡ Restricted ROWID is used to point to the rows of the table. which in turn point to blocks at the next level in the index. Effect of DML Operations on an Index The Oracle server maintains all the indexes when DML operations are carried out on the table.) ‡ ROWID of a row. At the lowest level are the leaf nodes. The PCTFREE setting has no effect on the index except at the time of creation. which contains entries that point to the next level in the index.

Sign up to vote on this title
UsefulNot useful