You are on page 1of 10

Knowing when to Rebuild Indexes

Brian Peasland
Database Administrator
EROS Data Center
3 February 2000

Abstract

Indexes provide a fast and efficient method of retrieving selected data from a table. By pointing to the
blocks that contain the selected data, the entire table does not have to read in order to extract the
required information. Most indexes in Oracle databases are built using the B-tree data structure. Contrary
to some widely accepted beliefs, indexes in Oracle are not self-balancing. After a table experiences a
large number of inserts, updates, and deletes, the index can become unbalanced and fragmented and
can hinder query performance. Knowing when to rebuild the indexes is a topic of some confusion. This
paper hopes to shine some light on the subject.

Where is the index now?

In order to understand what we must do with the index, we must first get an idea of the current state of
the index. This can be accomplished by using the ANALYZE INDEX VALIDATE STRUCTURE command.
Normally, the ANALYZE INDEX command creates either computed or estimated statistics for the index
that can be seen in the DBA_INDEXES view. Analyzing the index changes the optimizer’s execution
plans for queries that potentially use that index. This action may produce unintentional side effects,
especially if the index has not previously been analyzed. The VALIDATE STRUCTURE command can be
safely executed without affecting the optimizer. The VALIDATE STRUCTURE command populates the
SYS.INDEX_STATS table only. The SYS.INDEX_STATS table can be accessed with the public synonym
INDEX_STATS. The INDEX_STATS table will only hold validation information for one index at a time.
You will need to query this table before validating the structure of the next index.

Below is an example of ANALYZE INDEX VALIDATE STRUCTURE and sample output from
INDEX_STATS:

SQLWKS> ANALYZE INDEX shopping_basket_pk VALIDATE STRUCTURE;


Statement processed.
SQLWKS> SELECT
name,height,lf_rows,lf_blks,del_lf_rows,distinct_keys,used_space
2> FROM INDEX_STATS;
NAME HEIGHT LF_ROWS LF_BLKS DEL_LF_ROW
DISTINCT_K USED_SPACE
------------------------- --------- ---------- ---------- ---------- --------
-- ----------
SHOPPING_BASKET_PK 2 1 3 1
1 65
1 row selected.

I have the information, now what?

There are two rules of thumb to help determine if the index needs to be rebuilt. If it is determined that the
index needs to be rebuilt, this can easily be accomplished by the ALTER INDEX REBUILD command.
Although not necessarily recommended, this command could be executed during normal operating hours.
Rebuilding the index uses the existing index as a basis. The alternative is to drop and re-create the index.
Creating an index uses the base table as its data source that needs to put a lock on the table. The index
is also unavailable during creation.
First rule of thumb is if the index has height greater than four, rebuild the index. For most indexes, the
height of the index will be quite low, i.e. one or two. I have seen an index on a 3 million-row table that had
height three. An index with height greater than four may need to be rebuilt as this might indicate a skewed
tree structure. This can lead to unnecessary database block reads of the index. It is helpful to know the
data structure for the table and index. Most times, the index height should be two or less, but there are
exceptions.

The second rule of thumb is that the deleted leaf rows should be less than 20% of the total number of leaf
rows. An excessive number of deleted leaf rows indicates that a high number of deletes or updates have
occurred to the index column(s). The index should be rebuilt to better balance the tree. The
INDEX_STATS table can be queried to determine if there are excessive deleted leaf rows in relation to
the total number of leaf rows. Let’s look at an example:

SQLWKS> ANALYZE INDEX item_basket_pk VALIDATE STRUCTURE;


Statement processed.
SQLWKS> SELECT name,height,lf_rows,del_lf_rows,(del_lf_rows/lf_rows)*100 as
ratio
2> FROM INDEX_STATS;
NAME HEIGHT LF_ROWS DEL_LF_ROW RATIO
------------------------------ ---------- ---------- ---------- ----------
ITEM_BASKET_PK 1 235 74 31.4893617
1 row selected.
In this example, the ratio of deleted leaf rows to total leaf rows is clearly above 20%. This is a good
candidate for rebuilding. Let’s rebuild the index and examine the results.
SQLWKS> ALTER INDEX item_basket_pk REBUILD;
Statement processed.
SQLWKS> ANALYZE INDEX item_basket_pk VALIDATE STRUCTURE;
Statement processed.
SQLWKS> SELECT name,height,lf_rows,del_lf_rows,(del_lf_rows/lf_rows)*100 as
ratio
2> FROM INDEX_STATS;
NAME HEIGHT LF_ROWS DEL_LF_ROW RATIO
------------------------------ ---------- ---------- ---------- ----------
ITEM_BASKET_PK 1 161 0 0
1 row selected.

The index is rebuilt and validated once again. Examining the INDEX_STATS table shows that the 74
deleted leaf rows were dropped from the index. Notice that the total number of leaf rows went from 235 to
161, which is a difference of 74 leaf rows. This index should provide better performance for the
application.

A Case Study

To illustrate the ideas in this paper, a test case was created and various queries were executed against a
large table before and after rebuilding the primary key index. It should be noted that this test case was
created specifically for this paper and results in the field may vary.

The test table only contains two columns, ID and NAME. The table is described below:

REQ SQL> desc analyze;


Name Null? Type
------------------------------- -------- ----
ID NUMBER
NAME VARCHAR2(20)
The test table has a primary key, ANALYZE_PK on the ID field. The test table was populated with 1
million rows of data. After populating the table, over 270,000 rows were selectively deleted from the table.
This method ensured that the primary key index would have a sufficient number of deleted leaf rows.
After populating and deleting rows from the table, the index was validated as below:

REQ SQL> analyze index analyze_pk validate structure;

Index analyzed.

REQ SQL> select name,height,lf_rows,del_lf_rows,del_lf_rows/lf_rows from


index_stats;

NAME HEIGHT LF_ROWS DEL_LF_ROWS


DEL_LF_ROWS/LF_ROWS
------------------------------ ---------- ---------- ----------- ------------
-------
ANALYZE_PK 3 990206
277353 .280096263

It is now clear that the deleted leaf rows comprise approximately 28% of the total leaf rows.

Four different queries were executed against the table. All of these queries will utilize the primary key
index as verified by the explain plan. These queries are executed against the table with the current index.
The index is then rebuilt. These queries are re-executed and the results are compared. The four queries
are:

1. select count(*) from analyze where id between 250000 and 600000;


2. select count(*) from analyze where id in (50000,4321,698754);
3. select count(*) from analyze where id > 500000;
4. select count(*) from analyze where id = 4321;
With the current index, these three queries are executed against the test table. SQL Trace is enabled and
the queries are shown in the following excerpt from TKPROF:
*****************************************************************************
***

select count(*)
from
analyze where id between 250000 and 600000

call count cpu elapsed disk query current


rows
------- ------ -------- ---------- ---------- ---------- ---------- -------
---
Parse 1 0.00 0.00 0 0 0
0
Execute 1 0.00 0.00 0 0 0
0
Fetch 2 0.47 0.47 0 293 0
1
------- ------ -------- ---------- ---------- ---------- ---------- -------
---
total 4 0.47 0.47 0 293 0
1

Misses in library cache during parse: 0


Optimizer goal: CHOOSE
Parsing user id: 5 (SYSTEM)

Rows Execution Plan


------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
0 SORT (AGGREGATE)
138614 INDEX (RANGE SCAN) OF 'ANALYZE_PK' (UNIQUE)

*****************************************************************************
***

select count(*)
from
analyze where id in (50000,4321,698754)

call count cpu elapsed disk query current


rows
------- ------ -------- ---------- ---------- ---------- ---------- -------
---
Parse 1 0.00 0.01 0 0 0
0
Execute 1 0.00 0.00 0 0 0
0
Fetch 2 0.00 0.00 0 9 0
1
------- ------ -------- ---------- ---------- ---------- ---------- -------
---
total 4 0.00 0.01 0 9 0
1

Misses in library cache during parse: 0


Optimizer goal: CHOOSE
Parsing user id: 5 (SYSTEM)

Rows Execution Plan


------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
0 SORT (AGGREGATE)
0 CONCATENATION
1 INDEX (UNIQUE SCAN) OF 'ANALYZE_PK' (UNIQUE)
1 INDEX (UNIQUE SCAN) OF 'ANALYZE_PK' (UNIQUE)
1 INDEX (UNIQUE SCAN) OF 'ANALYZE_PK' (UNIQUE)

*****************************************************************************
***

select count(*)
from
analyze where id > 500000

call count cpu elapsed disk query current


rows
------- ------ -------- ---------- ---------- ---------- ---------- -------
---
Parse 1 0.00 0.00 0 0 0
0
Execute 1 0.00 0.00 0 0 0
0
Fetch 2 1.11 1.11 0 1611 0
1
------- ------ -------- ---------- ---------- ---------- ---------- -------
---
total 4 1.11 1.11 0 1611 0
1

Misses in library cache during parse: 1


Optimizer goal: CHOOSE
Parsing user id: 5 (SYSTEM)

Rows Execution Plan


------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
0 SORT (AGGREGATE)
445545 INDEX (RANGE SCAN) OF 'ANALYZE_PK' (UNIQUE)

*****************************************************************************
***

select count(*)
from
analyze where id = 4321

call count cpu elapsed disk query current


rows
------- ------ -------- ---------- ---------- ---------- ---------- -------
---
Parse 1 0.00 0.00 0 0 0
0
Execute 1 0.00 0.00 0 0 0
0
Fetch 2 0.00 0.00 0 3 0
1
------- ------ -------- ---------- ---------- ---------- ---------- -------
---
total 4 0.00 0.00 0 3 0
1

Misses in library cache during parse: 1


Optimizer goal: CHOOSE
Parsing user id: 5 (SYSTEM)

Rows Execution Plan


------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
0 SORT (AGGREGATE)
1 INDEX (UNIQUE SCAN) OF 'ANALYZE_PK' (UNIQUE)

*****************************************************************************
***
After these results have been obtained, the index is rebuilt. The index is then validated and the results are
show below:

REQ SQL> alter index analyze_pk rebuild;

Index altered.

REQ SQL> analyze index analyze_pk validate structure;

Index analyzed.

REQ SQL> select name,height,lf_rows,del_lf_rows,del_lf_rows/lf_rows from


index_stats
;

NAME HEIGHT LF_ROWS DEL_LF_ROWS


DEL_LF_ROWS/LF_ROWS
------------------------------ ---------- ---------- ----------- ------------
-------
ANALYZE_PK 3 712853 0
0

It is clear that the deleted leaf rows have been removed from the index. The four queries are run against
the table once again. Again, SQL Trace is enabled and the queries are shown in the following excerpt
from TKPROF:

*****************************************************************************
***

select count(*)
from
analyze where id between 250000 and 600000

call count cpu elapsed disk query current


rows
------- ------ -------- ---------- ---------- ---------- ---------- -------
---
Parse 1 0.00 0.00 0 0 0
0
Execute 1 0.00 0.00 0 0 0
0
Fetch 2 0.23 0.23 0 679 0
1
------- ------ -------- ---------- ---------- ---------- ---------- -------
---
total 4 0.23 0.23 0 679 0
1

Misses in library cache during parse: 0


Optimizer goal: CHOOSE
Parsing user id: 5 (SYSTEM)

Rows Execution Plan


------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
0 SORT (AGGREGATE)
138614 INDEX (RANGE SCAN) OF 'ANALYZE_PK' (UNIQUE)

*****************************************************************************
***

select count(*)
from
analyze where id in (50000,4321,698754)

call count cpu elapsed disk query current


rows
------- ------ -------- ---------- ---------- ---------- ---------- -------
---
Parse 1 0.00 0.00 0 0 0
0
Execute 1 0.00 0.00 0 0 0
0
Fetch 2 0.00 0.00 0 9 0
1
------- ------ -------- ---------- ---------- ---------- ---------- -------
---
total 4 0.00 0.00 0 9 0
1

Misses in library cache during parse: 0


Optimizer goal: CHOOSE
Parsing user id: 5 (SYSTEM)

Rows Execution Plan


------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
0 SORT (AGGREGATE)
0 CONCATENATION
1 INDEX (UNIQUE SCAN) OF 'ANALYZE_PK' (UNIQUE)
1 INDEX (UNIQUE SCAN) OF 'ANALYZE_PK' (UNIQUE)
1 INDEX (UNIQUE SCAN) OF 'ANALYZE_PK' (UNIQUE)

*****************************************************************************
***

select count(*)
from
analyze where id > 500000

call count cpu elapsed disk query current


rows
------- ------ -------- ---------- ---------- ---------- ---------- -------
---
Parse 1 0.00 0.00 0 0 0
0
Execute 1 0.00 0.00 0 0 0
0
Fetch 2 0.96 0.96 0 933 0
1
------- ------ -------- ---------- ---------- ---------- ---------- -------
---
total 4 0.96 0.96 0 933 0
1

Misses in library cache during parse: 0


Optimizer goal: CHOOSE
Parsing user id: 5 (SYSTEM)

Rows Execution Plan


------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
0 SORT (AGGREGATE)
445545 INDEX (RANGE SCAN) OF 'ANALYZE_PK' (UNIQUE)

*****************************************************************************
***

select count(*)
from
analyze where id = 4321

call count cpu elapsed disk query current


rows
------- ------ -------- ---------- ---------- ---------- ---------- -------
---
Parse 1 0.00 0.00 0 0 0
0
Execute 1 0.00 0.00 0 0 0
0
Fetch 2 0.00 0.00 0 3 0
1
------- ------ -------- ---------- ---------- ---------- ---------- -------
---
total 4 0.00 0.00 0 3 0
1

Misses in library cache during parse: 0


Optimizer goal: CHOOSE
Parsing user id: 5 (SYSTEM)

Rows Execution Plan


------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
0 SORT (AGGREGATE)
1 INDEX (UNIQUE SCAN) OF 'ANALYZE_PK' (UNIQUE)

The same queries have now been executed against the test table before and after rebuilding the index.
The results are summarized in the following table:

Before index rebuild After index rebuild % Change

CPU Elapsed CPU Elapsed CPU Elapsed


Query 1 0.47 0.47 0.23 0.23 51% 51%

Query 2 0.00 0.01 0.00 0.00 0% 0%

Query 3 1.11 1.11 0.96 0.96 13.5% 13.5%

Query 4 0.00 0.00 0.00 0.00 0% 0%

The results in the above table show that queries 1 and 3 experienced up to 51% improvement in
execution time. It should be noted that the queries were executed at least once prior to tracing the queries
so that the data would reside in the buffer cache and first time executions of the queries would not skew
the results. This fact explains why the CPU and elapsed times for a query are nearly identical. The data
was read from the buffer cache. Queries 2 and 4 took minimal time to execute. These queries searched
for specific values. Since the data was cached, the CPU and Elapsed times are minimal. Queries 1 and 3
took significantly longer. These queries returned a range of values. This paper does not attempt to
generate any findings on the improvement of different types of queries when rebuilding an index. Rather it
gives a broader statement that query execution times will improve after rebuilding substandard indexes.

Script to find indexes to rebuild

Below is a sample script that can be run to determine which indexes need to be rebuilt. For those indexes
that need to be rebuilt, the ALTER INDEX REBUILD command is dynamically generated as output. The
user can tailor the height and percentage of deleted leaf rows by altering the vMaxHeight and vMaxDel
variables. The output of this script can be spooled to a file. This file can then be run to rebuild the indexes.

-- validate_idx.sql
-- by Brian Peasland
-- Database Administrator
-- Raytheon / EROS Data Center
-- 3 February 2000
--
-- This script will check indexes to find candidates for rebuilding.
-- Run this script in SQL*Plus as a user with SELECT ANY TABLE
-- privileges.
--
-- This script can be used and modified without permission. Run this
-- script at your own risk! The script author is not responsible for
-- any problems that may arise from running this script.

set serveroutput on size 100000

DECLARE
vOwner dba_indexes.owner%TYPE; /* Index Owner */
vIdxName dba_indexes.index_name%TYPE; /* Index Name */
vAnalyze VARCHAR2(100); /* String of Analyze Stmt */
vCursor NUMBER; /* DBMS_SQL cursor */
vNumRows INTEGER; /* DBMS_SQL return rows */
vHeight index_stats.height%TYPE; /* Height of index tree */
vLfRows index_stats.lf_rows%TYPE; /* Index Leaf Rows */
vDLfRows index_stats.del_lf_rows%TYPE; /* Deleted Leaf Rows */
vDLfPerc NUMBER; /* Del lf Percentage */
vMaxHeight NUMBER; /* Max tree height */
vMaxDel NUMBER; /* Max del lf percentage */
CURSOR cGetIdx IS SELECT owner,index_name
FROM dba_indexes WHERE OWNER NOT LIKE 'SYS%';
BEGIN
/* Define maximums. This section can be customized. */
vMaxHeight := 3;
vMaxDel := 20;

/* For every index, validate structure */


OPEN cGetIdx;
LOOP
FETCH cGetIdx INTO vOwner,vIdxName;
EXIT WHEN cGetIdx%NOTFOUND;
/* Open DBMS_SQL cursor */
vCursor := DBMS_SQL.OPEN_CURSOR;
/* Set up dynamic string to validate structure */
vAnalyze := 'ANALYZE INDEX ' || vOwner || '.' || vIdxName || ' VALIDATE
STRUCTURE';
DBMS_SQL.PARSE(vCursor,vAnalyze,DBMS_SQL.V7);
vNumRows := DBMS_SQL.EXECUTE(vCursor);
/* Close DBMS_SQL cursor */
DBMS_SQL.CLOSE_CURSOR(vCursor);
/* Does index need rebuilding? */
/* If so, then generate command */
SELECT height,lf_rows,del_lf_rows INTO vHeight,vLfRows,vDLfRows
FROM INDEX_STATS;
IF vDLfRows = 0 THEN /* handle case where div by zero */
vDLfPerc := 0;
ELSE
vDLfPerc := (vDLfRows / vLfRows) * 100;
END IF;
IF (vHeight > vMaxHeight) OR (vDLfPerc > vMaxDel) THEN
DBMS_OUTPUT.PUT_LINE('ALTER INDEX ' || vOwner || '.' || vIdxName || '
REBUILD;');
END IF;

END LOOP;
CLOSE cGetIdx;
END;
/

Summary

One of the areas lacking in Oracle documentation pertains to rebuilding an index. Oracle’s ANALYZE
INDEX VALIDATE STRUCTURE command provides a nice way to check an index to see if it is a
candidate for rebuilding. This command does not affect the Oracle optimizer’s execution plan for queries
that may use the index. The results in the INDEX_STATS are checked after issuing the VALIDATE
STRUCTURE command. If an index has excessive height (greater than four) or a high number of deleted
leaf rows (over 20% of the total), we rebuild the index.

A test case was studied to show the potential for improvements after rebuilding an index. In one example,
the query executed 51% faster after rebuilding the index. While this example is specific to the test case, it
does show the potential for improvement after rebuilding an index.