You are on page 1of 15

Joe Webb 29 September 2008

296825 views 475 14

Joe Webb Using Covering Indexes


29 September to Improve Query
2008
Performance
4
Designers of database systems will often
7
assume that the use of a clustered index is
296825 views
5
always the best approach. However the
14 nonclustered Covering index will usually
provide the optimum performance of a query.

Introduction
There are many ways to measure a database’s
performance. We can look at its overall availability;
that is: are our users able to get the information they
need when they need it? We can consider its
concurrency; how many users can query the
database at one time? We can even watch the
number of transactions that occur per second to
gauge its level of activity.

Often for our users however, it comes down to one


very simple, but somewhat nebulous, measurement
– perceived responsiveness. Or put another way,
when a user issues a query does he receive his
results without having to wait too long?

Of course, there are many factors that in uence this.


There is the hardware itself and how its con gured.
There is the physical implementation of the
database; the logical design of the database; and
even how the queries themselves are written.
In this article however, I will focus on one speci c
technique that can be used to improve performance
– creating covering indexes.

An indexing refresher
Before taking on the challenge of covering indexes,
let’s take a moment for a brief but very relevant
review of index types and terminology.

Indexes allow SQL Server to quickly nd rows in a


table based on key values, much like the index of a
book allows us to easily nd the pages that contain
the information we want. There are two types of
indexes in SQL Server, clustered and nonclustered
indexes.

Clustered Indexes    
A clustered index is an index whose leaf nodes, that
is the lowest level of the index, contain the actual
data pages of the underlying table. Hence the index
and the table itself are, for all practical purposes,
one and the same.  Each table can have only one
clustered index. For more information on clustered
indexes, see the Books Online topic “Clustered Index
Structures” (http://msdn.microsoft.com/en-
us/library/ms177443.aspx).

When a clustered index is used to resolve a query,


SQL Server begins at the root node for the index and
traverses the intermediate nodes until it locates the
data page that contains the row it’s seeking.

Many database designs make proli c use of


clustered indexes. In fact, it is generally considered
a best practice to include a clustered index on each
table; of course that’s painting with a very broad
brush and there will most assuredly be exceptions.
For more information about the bene ts of clustered
indexes, see the SQL Server Best Practices Article
entitled “Comparing Tables Organized with
Clustered Indexes versus Heaps” on TechNet .

Lets’ consider an example. In the Figure 1, the


Customers table has a clustered index de ned on
the Customer_ID column. When a query is executed
that searches by the Customer_ID column, SQL
Server navigates through the clustered index to
locate the row in question and returns the data. This
can be seen in the Clustered Index Seek operation in
the query’s Execution Plan.

Figure 1. Clustered index execution plan

The following Transact-SQL statement was used to


create the clustered index on the Customer_ID
column.

CREATE CLUSTERED INDEX [ix_Customer_ID


(
[Customer_ID] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NO

Nonclustered indexes
Nonclustered indexes use a similar methodology to
store indexed data for tables within SQL Server.
However in a nonclustered index, the lowest level of
the index does not contain the data page of the
table. Instead, it contains the information that
allows SQL Server to navigate to the data pages it
needs. For tables that have a clustered index, the
leaf node of the nonclustered index contains the
clustered index keys. In the previous example, the
leaf node of a nonclustered index on the Customers
table would contain the Customer_ID key.

If the underlying table does not have a clustered


index (this data structure is known as a heap), the
leaf node of the nonclustered index contains a row
locator to the heap data pages.

In the following example, a nonclustered composite


index has been created on the Customers table as
described in the following Transact-SQL code.

CREATE NONCLUSTERED INDEX [ix_Customer


(
[Last_Name] ASC,
[First_Name] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_N

In this case when a query that searched by


customer last name was executed, the SQL Server
query optimizer chose to use the
ix_Customer_Name index to resolve the query. This
can be seen in the Execution Plan in the following
gure.

Figure 2. Nonclustered index execution plan


For more information on clustered indexes, see the
Books Online topic “Nonclustered Index Structures” .

Using Nonclustered indexes


As illustrated in the preceding example,
nonclustered indexes may be employed to provide
SQL Server with an e cient way to retrieve data
rows. However, under some circumstances, the
overhead associated with nonclustered indexes may
be deemed too great by the query optimizer and SQL
Server will resort to a table scan to resolve the
query. To understand how this may happen, let’s
examine the preceding example in more detail.

Key Lookups
Looking again at the graphical query execution plan
depicted in Figure 2, notice that the plan not only
includes an Index Seek operation that uses the
ix_Customer_Name nonclustered index, it also
includes a Key Lookup operation.

SQL Server uses a Key Lookup to retrieve non-key


data from the data page when a nonclustered index
is used to resolve the query. That is, once SQL
Server has used the nonclustered index to identify
each row that matches the query criteria, it must
then retrieve the column information for those rows
from the data pages of the table.

Since the leaf node of the nonclustered index


contains the key value information for the row, SQL
Server must navigate through the clustered index to
retrieve the columnar information for each row of
the result set. In this example, SQL Server choose to
do this using a nested loop join type.

This query produced a result of 1,000 rows and


nearly 100% of the expense of the query was directly
attributed to the Key Lookup operation. Digging a
little deeper into the Key Lookup operation, we can
see why.

Figure 3. Key lookup operation properties

This Key Lookup operation was executed 1000


times, once for each row of the result set. 

Resorting to Table Scans


As the number of rows in the result set increases, so
does the number of Key lookups. At some point, the
cost associated with the Key Lookup will outweigh
any bene t provided by the nonclustered index.

To illustrate this point, let’s modify the query so that


it retrieves more rows. Figure 4. depicts the new
query along with the actual execution plan used to
resolve the query.
Figure 4. Resorting to a table scan

The new query searches for a range of customers


whose last name is between “Roland” and “Smith”.
There are 69,000 of them in our database. From the
actual execution plan, we can see that the query
optimizer determined that the overhead cost of
performing a Key Lookup for each of the 69,000
rows was more than simply traversing the entire
table via a table scan. Hence, our
ix_Customer_Name index was not used at all during
the query.

Figure 5  shows some additional properties of the


table scan.
Figure 5. Properties for the table scan operation

One may be tempted to force SQL Server to resolve


the query using the nonclustered index by supplying
a table hint as shown in the following illustration.

Figure 6. Using a table hint to resolve the query

This is almost always a bad idea since the optimizer


generally does a good job in choosing an
appropriate execution plan. Additionally, the
optimizer bases its decisions on column statistics;
those are likely to change over time. A table hint that
works well today, may not work well in the future
when the selectivity of the key columns change.

Figure 7 shows the properties for the Key Lookup


when we forced SQL Server to use the nonclustered
ix_Customer_Name index. The Estimated Operator
Cost for the Key Lookup is 57.02 compared to 12.17
for the Clustered Index Scan shown in Figure 5.
Forcing SQL Server to use the index signi cantly
affected performance, and not for the better!

Figure 7. Properties of the Key Lookup for the table


hint

Covering Indexes
So, if Key Lookups can be detrimental to
performance during query resolution for large result
sets, the natural question is: how can we avoid
them? To answer that question, let’s consider a
query that does not require a Key Lookup.

Let’s begin by modifying our query so that it no


longer selects the Email_Address column. Figure 8
illustrates this updated query along with its actual
execution plan.

Figure 8. Reduced query to eliminate the Key Lookup

The new execution plan has been streamlined and


only uses the ix_Customer_Name nonclustered
index. Looking at the properties of the operation
providers further evidence of the improvement. The
properties are shown in Figure 9.

Figure 9. Reduced query to eliminate the Key


Lookup properties
The Estimated Operator Cost went down
dramatically, from 12.17 in Figure 5 to 0.22 in Figure
9.  We could also look at the Logical and Physical
Read characteristics by setting STATISTICS IO on,
however for this demonstration it’s su cient to view
the Operator Costs for each operation.

The observed improvement is due to the fact that


the nonclustered index contained all of the required
information to resolve the query. No Key Lookups
were required. An index that contains all information
required to resolve the query is known as a
“Covering Index”; it completely covers the query.

Using the Clustered Key Column


Recall that if a table has a clustered index, those key
columns are automatically part of the nonclustered
index. So, the following query is a covering query by
default.

Figure 10. Covering index using the clustered index keys

However unless your clustered index contains the


required columns, which is not the case in our
example, it will be insu cient for covering our query.

Adding Key Columns to the


Index
To increase the likelihood that a nonclustered index
is a covering index, it is tempting to begin adding
additional columns to the index key. For example, if
we regularly query the customer’s middle name and
telephone number, we could add those columns to
the ix_Customer_Name index. Or, to continue with
our previous example, we could the Email_Address
column to the index as shown in the following
Transact-SQL code.

CREATE NONCLUSTERED INDEX [ix_Customer


(
[Last_Name] ASC,
[First_Name] ASC,
[Email_Address] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NO

Before doing so, it is important to remember that


indexes must be maintained by SQL Server during
data manipulation operations. Too many index hurts
performance during write operations. Additionally,
the wider the index, that is to say the more bytes
that make up the index keys, the more data pages it
will take to store the index.

Furthermore, there are some built in limitations for


indexes. Speci cally, indexes are limited to 16 key
columns or 900 bytes, whichever comes rst, in
both SQL Server 2005 and SQL Server 2008. And
some datatypes cannot be used as index keys,
varchar(max) for instance.

Including Non-Key columns


SQL Server 2005 provided a new feature for
nonclustered indexes, the ability to include
additional, non-key columns in the leaf level of the
nonclustered indexes. These columns are
technically not part of the index, however they are
included in the leaf node of the index. SQL Server
2005 and SQL Server 2008 allow up to 1023
columns to be included in the leaf node.
To create a nonclustered index with included
columns, use the following Transact-SQL syntax.

CREATE NONCLUSTERED INDEX [ix_Customer


(
[Last_Name] ASC,
[First_Name] ASC
)
INCLUDE ( [Email_Address]) WITH (PAD_I

Rerunning our query yields an execution plan that


make use of our new index to rapidly return the
result set. The execution plan may be found in the
Figure 11.

Figure 11. Covering query with included columns

Notice that even though our query selects columns


that are not part of the nonclustered index’s key,
SQL Server is still able to resolve the query without
having to use a Key Lookup for each row. Since the
ix_CustomerEmail index includes the Email_Address
column as part of its de nition, the index “covers”
the query. The properties of the Nonclustered Index
Seek operator con rm our ndings as depicted in
the Figure 12.
Figure 12. Execution properties for the covering
query with included columns

From this execution plan, we can see that the


Estimate Operator Cost decreased from 12.17 in
Figure 5 to 0.41. Including an additional non-key
column has dramatically improved the performance
of the query.

For more information about including non-key


columns in a nonclustered index, See “Index with
Included Columns” in Books Online .

Summary
By including frequently queried columns in
nonclustered indexes, we can dramatically improve
query performance by reducing I/O costs. Since the
data pages for an nonclustered index are frequently
readily available in memory, covering indexes are the
usually the ultimate in query resolution.

You might also like