You are on page 1of 60

RDBMS Fundamentals: Indexing

Relational data structures: Indexes (examples on Informix Dynamic Server IDS)

May 16, 2012

2009 IBM Corporation

Information Management Informix

Objectives
At the end of this training session, you will be able to: Understand the benefits and costs of indexes for the databases Understand how indexes are implemented Identify the different index structures Understand the B+ tree Understand features and guidelines to use: Implicit versus Explicit indexes Unique versus Duplicate indexes Simple versus Composite indexes Cluster indexes Functional indexes Understand decision criteria for Index storage: Attached versus Detached indexes Index fragmentation Select an appropriate fill factor for an index Know how to alter, drop, rename and maintain an index Understand how Indexing works on IBM Informix Dynamic Server (IDS)

2009 IBM Corporation

Information Management Informix

Sequential Scans
A Sequential Scan or Table Scan: Reads all the pages that belong to the table, returns all the rows of the table Starts with the first page of the table, and orderly travels across all the devices that contain pages of the table until retrieve the last page Sequential scan of a large table is an expensive operation in OLTP systems

Small sequential scans (seq scans in small tables) are acceptable in OLTP

I/O is expensive (mechanic) operation Also fills up the memory buffers with unnecessary pages, affecting the buffer profiling and the performance of the database server

2009 IBM Corporation

Information Management Informix

When do Sequential Scans work well?


In any case of access (random or sequential) on very small tables The table fits in just a few pages and can be accessed with minimum I/O Reporting-type of queries, where all the rows of the table are needed Random-access is not an option because the whole table is needed to process the results Result set is large enough or non-selective enough, so scanning the whole table is cheaper than randomly access every row
Non-restrictive filters using low cardinality columns or low selectivity filters Example: Queries using aggregate functions (max, min, avg, sum) and no filters

2009 IBM Corporation

Information Management Informix

Sequential Scans The need for indexes (1)


Imagine this scenario: Filter rows in a table based on a condition Lookup case, search for rows meeting a restrictive condition (filter) Just a few rows meet the condition (e.g. <= 5% of total data) The filter will retrieve scattered rows, not placed together in disk We should NOT use a sequential scan to do a random or nonsequential access with high-selectivity filters The bad performance will be more evident as we use a more restrictive / selective filter (for instance, on columns with high-cardinality)

2009 IBM Corporation

Information Management Informix

Sequential Scans The need for indexes (2)


Imagine this scenario: Join between two tables, at least one is large The tables are related in just a few rows (equi-join) The iterations in a nested loop join scenario will severely decrease the performance The # of access reads will be multiplied by the rows sequentially scanned The performance depends on the order taken for the tables in the join and whether or not an index will be used in any of the tables

2009 IBM Corporation

Information Management Informix

Indexes The Basics (1)


An index is a structure or object in the database

Database indexes are similar to indexes in books or file cabinets


Indexes provide fast access to rows in tables meeting certain condition in a query Minimize I/O, improving performance, specially in random access It is a dynamic structure: Changes as the data in the table changes A table can have several indexes, to satisfy several queries

2009 IBM Corporation

Information Management Informix

Indexes The Basics (2)


Unique indexes are necessary on column(s) that must be unique Presence of an index can allow the optimizer to speed up a query The database optimizer decides whether to use an index or not We can force the optimizer to use or avoid an index The optimizer can use an index in the following ways: To replace sequential scans of a table with nonsequential/random access To avoid reading row data when processing expressions that name only indexed columns To avoid a sort (including building a temporary table) when executing the GROUP BY and ORDER BY clauses

An index on the appropriate column can save thousands, tens of thousands, or in extreme cases, even millions of disk operations during a query
However, indexes entail costs (in space, processing and maintenance)

2009 IBM Corporation

Information Management Informix

Query Speed Comparison: Seq scan vs Index scan


Example: Suppose you have two tables:
tab1: with 200 rows (small table), and tab2: with 500,000 rows (large table) Both with a unique index on the joining column Assume 1 row per page, therefore 1 data read (I/O) needed per row For tab1: Assume it takes 2 index reads (+ 1 data read per row) => 3 I/Os For tab2: Assume it takes 3 index reads (+ 1 data read per row) => 4 I/Os

A simple select to find related rows from the two tables is:

SELECT * FROM tab1, tab2 where tab1.col1=tab2.col2


Results: Depending on the database optimizers decision How many I/O (disk) accesses (index + data reads) needed?
If select from the tab1 first and then joins to tab2 using the index: If selects from tab2 first, and then joins to tab1 using the index: If no indexes used at all: Sequential scan in both tables
2 million disk reads! = 500,000 for tab2 + 500,000x3 for tab1 around 1 billion disk reads! = 200 for tab1 x 500,000 for tab2 1,000 disk reads = 200 for tab1 + 200x4 for tab2

2009 IBM Corporation

Information Management Informix

Typical Index Structures


Binary Search Tree (BST) B-Tree B+ Tree: DB Worlds favorite and most widely used index Hash Bitmap R-Tree

10

2009 IBM Corporation

Information Management Informix

B-Tree (1) Analogy with a File Cabinet


File Cabinet: Organized storage B-Tree index

11

2009 IBM Corporation

Information Management Informix

B-Tree (2)
Allows several keys in one index page / node Keeps similar valued records together on a disk page This takes advantage of locality of reference over binary search trees (BST) Allows nodes to be incompletely filled Every node in the tree will be full at least to a certain minimum percentage Some space is wasted (nodes are not entirely full) Requires less re-balancing than binary search trees (BST) Reduces the number of disk fetches necessary during a search They are always in perfect balance, as all leaf nodes are at the same depth Keeps data sorted and allows searches, insertions, and deletions in logarithmic amortized time Unlike self-balancing BST, it is optimized for systems that read and write large blocks of data Grows from the bottom: When a node is over-full, it is split and the added node is put one level up. Deletions are the reverse of additions
12 2009 IBM Corporation

Information Management Informix

B-Tree (3)
In B-trees: The root node points down to branch nodes Branch nodes point down to leaf nodes Leaf nodes address the actual data rows But: B-tree has a problem It is inefficient to search for a range of data values Not able to be used to move laterally through the tree, as well as up and down
If all adjacent sibling nodes were connected, you could scan an entire tree level with minimum effort. This is done by B+ trees, used by some of the major database vendors

13

2009 IBM Corporation

Information Management Informix

B-plus Tree (B+ tree) (1)


It is an improved B-tree. The preferred index structure Widely used in databases. Used in Informix Levels: The topmost level of the index structure hierarchy contains a single root page Each branch page has entries that point to pages in the next level of the index and to the immediate peer nodes (next nodes at the same level) Each leaf page contains a list of index entries that point to rows in the table Index entries in leaf pages/nodes are sorted in key-value order An Index entry consists of a key and one or more row pointers The key is a copy of the indexed columns from one row of data A row pointer provides an address used to locate a row that contains the key (rowid) A sample b+ tree where only forward scan is available

14

2009 IBM Corporation

Information Management Informix

B-plus Tree (B+ tree) (2) Informix


In Informix: B+ tree index structures An index leaf is conformed of:

The Index root and branch nodes:


Similar to the leaves, but instead of rowids, they have the IDs of other index nodes (pointers to other nodes)

A key value (the index key) Row IDs to find the data row(s) A delete flag

Each Index node/page is doubledlinked with its previous and next index peer node, to optimize:
Key-only and Range searches Both Forward and Reverse Order searches

An index key contains the value of the column we want to search for
e.g.: first_name, SSN, zip_code
2009 IBM Corporation

15

Information Management Informix

B+Tree (3) Informix


In B+ trees: The root node points down to branch nodes Branch nodes point down, left and right Leaf nodes point left, right, and down to data via rowids The ability to move right or left from a node to its adjacent node puts the plus in B+ tree All IBM Informix indexes for non-multidimensional data are B+ trees

16

2009 IBM Corporation

Information Management Informix

B-plus Tree (B+ tree) (4) Informix

Index Set

Sequence Set

In Informix: Forward (Ascending) and Reversed-order (Descending) Index scan available using one single index. Examples (second one is a Key-Only search):
Select fname, lname from customer order by customer_num; Select customer_num from customer order by customer_num desc;

Range searches get improved with this structure too. Ex:

Select customer_num, fname, lname from customer where customer_num between 50 and 300; Select customer_num from customer where customer_num >= 50 and customer_num <= 300;
2009 IBM Corporation

17

Information Management Informix

B+ Tree Splits

18

2009 IBM Corporation

Information Management Informix

R-Tree (1)
An R-tree index is a secondary data structure (or access method) that organizes access to data similar to a B-tree index. Used in Informix, for multi-dimensional and spatial data R-tree is specifically designed to index table columns that contain the following types of data: Multidimensional data:
Spatial data in two or three dimensions

Combinations of numerical values treated as multidimensional values:

An extra dimension that represents time could also be included.

Range values:

Such as a configuration for a house that includes the number of stories, the number of bedrooms, the number of baths, the age of the house, and the sqf

Such as the time of a television program (9:00 P.M. to 9:30 P.M.) (X, Y) coordinates of geographical data.

A common real-world usage for an R-tree might be: "Find all museums within 2 miles (3.2 km) of my current location".
19 2009 IBM Corporation

Information Management Informix

R-Tree (2)
Simple example of an R-tree for 2D rectangles

20

2009 IBM Corporation

Information Management Informix

Reading through a B+ tree Index (Index scan)


When you access a row through an index: You read the B+ tree starting at the root node and follow the nodes down to the lowest level, which contains the pointer to the data In the example, we need 3 index read operations to find the pointer to the data Keep key size to a minimum for two reasons: To allow a single index page in memory to hold more key values To have fewer B+ tree levels in the index, very important for performance: For Informix (IDS), the size of a node is the size of one page
reduces the number of read operations necessary to look up several rows An index with a 4-level tree needs 1 more read per row than an index with a 3-level tree If 100,000 rows read in 1 hour -> 100,000 fewer reads needed to obtain the same data

3 level b+ tree. # of page reads needed to find a row:

If key-only scan (no need to get the row data page):


3 (index-only) If we need to get the data pages: 4 (3 index + 1 data)

21

2009 IBM Corporation

Information Management Informix

Placement of an Index Review of Informix storage


Physical Storage Units: Chunk Largest unit of physical disk dedicated to Informix DB server data storage Provide DBAs with a significantly large unit for allocating disk space Max size of individual chunk is 4TB Number of allowable chunks is 32,766 It is a disk unit: Contains a certain number of pages Pages from different tables/indexes can be allocated in the same chunk Page Minimum I/O unit Commonly 2KB or 4KB, configurable up to 16K
22 2009 IBM Corporation

Raw device (disk partition), or Regular file system file (cooked file)

Information Management Informix

Placement of an Index Review of Informix storage


Physical Storage Units (cont): Extent Group of contiguous pages within a chunk that store data for a given table, index, table fragment or index fragment When creating a table, you specify its initial extent size and next extent size As the table grows, it can have multiple extents Important to manage appropriate extent sizes and number of extents Default size configurable in Informix configuration file (ONCONFIG) An extent contains pages of a single table, index or fragment
If not set, default is 16k

23

2009 IBM Corporation

Information Management Informix

Placement of an Index Review of Informix storage


Logical Storage Units: Tblspace Logical collection of all the extents (not necessarily contiguous) allocated to a specific table, index or fragment It can include extents stored on a single chunk or on multiple chunks A tblspace, however, is always contained within a single dbspace Dbspace Logical collection of chunks Form a pool of disk space that is used to store DBs, tables, indexes or fragments A single dbspace can contain pages of different tables/indexes Special purpose dbspaces:

Root dbspace: stores systems catalog information and databases Blobspace: stores simple binary large objects Sbspace: stores smart binary large objects Temporary dbspaces: for non-logged temp data
2009 IBM Corporation

24

Information Management Informix

Placement of an Index Review of Informix storage


CREATE DATABASE stores_demo WITH LOG IN dbs1;

CREATE TABLE customer (customer_num integer,) IN dbs2;


CREATE INDEX ix_cust ON customer (customer_num) IN dbs3;

You can place different database objects in different disk spaces: To balance I/O work across the disk devices and controllers available Example: The database stores_demo in a data dbspace dbs1 The table customer in a data dbspace dbs2 The index ix_cust in a data dbspace dbs3

If no dbspace is specified, the database is created in rootdbs (the Root dbspace) If no dbspace is specified, the table is created in a new tblspace where its database resides If no dbspace is specified, the index is created in a new tblspace in the same dbspace as its table When an index is created on an empty table, just the root node of the B-tree is created
2009 IBM Corporation

25

Information Management Informix

Attached vs Detached Indexes (1)


In the past, indexes used to be attached to the table meaning they did not have a separate tblspace, and the index pages were interleaving with the data pages of the table in the tables tblspace Now, on Informix, all indexes are detached meaning index extents are stored separately (in a new tblspace) from table extents, even if they are placed within the same dbspace as the table

26

2009 IBM Corporation

Information Management Informix

Attached vs Detached Indexes (2)


An index can be placed in a separate dbspace. Example, this index is stored in a separate dbspace called cust_ix_dbs: CREATE INDEX customer_ix ON customer (zipcode) IN cust_ix_dbs;

By default, index extents are created in the dbspace that holds the data (table) extents
A detached index can have a fragmentation strategy different from the one used by its table, that you set up explicitly with CREATE INDEX

27

2009 IBM Corporation

Information Management Informix

Bidirectional Traversal of Indexes (1)


ASC and DESC keywords specify the order to maintain the index

When creating an index on a column, if you omit or specify the ASC keyword, Informix stores the key values in ascending order Default order for a column From the smallest to the largest key Example on customers last name: Albertson, Beatty, Currie
create index ix_cust on customer(lname asc); or create index ix_cust on customer(lname); Use DESC keyword for Informix to store the key values in descending order From the largest to the smallest key Example on customers last names: Currie, Beatty, Albertson create index ix_cust on customer(lname desc);
28 2009 IBM Corporation

Information Management Informix

Bidirectional Traversal of Indexes (2)


Informixs bidirectional traversal capability of the database server lets you create just one index on a column and use that index for queries that specify sorting of results in either ascending or descending order of the sort column

29

2009 IBM Corporation

Information Management Informix

Implicit vs Explicit Index


Implicit indexes are created when a constraint (primary key, foreign key, unique constraint) is defined that cannot use an existing index You cannot specify a dbspace location, fragmentation strategy or fill factor for the index Implicit indexes are created in the same dbspace as the database
CREATE TABLE tab1 ( col1 INTEGER, col2 INTEGER, col3 CHAR(25), PRIMARY KEY (col1)) IN table1dbs;

Explicit indexes are created using CREATE INDEX statement It is recommended to explicitly create indexes that exactly match the referential constraint and then use ALTER TABLE to add the constraint
The constraint will use the existing index instead of implicitly creating one
CREATE TABLE tab1 (col1 INTEGER, col2 INTEGER, col3 CHAR (25)) IN table1dbs; CREATE INDEX index1 ON TABLE table_name(col1) IN idx1dbs FILLFACTOR 70; ALTER TABLE tab1 ADD CONSTRAINT PRIMARY KEY (col1);
30

2009 IBM Corporation

Information Management Informix

Unique vs Duplicate Index


Unique indexes allow only one occurrence of a value in the indexed column Created for columns whose values cannot be repeated within the table Used to enforce primary key (PK)s uniqueness or unique constraints An index entry is created for each row in the table, prevents duplicates Ex: customer_num, SSN, employee_id A non-unique, duplicate or secondary index is an index based in a non-key attribute, and allows identical values for multiple rows in an indexed column Avoid having highly duplicated indexes Ex: city, first_name, last_name, order_date, zipcode
CREATE UNIQUE INDEX cust_num_ix ON customer(customer_num); or CREATE DISTINCT INDEX cust_num_ix ON customer(customer_num);

Indexes become less effective as they are less unique

CREATE INDEX cust_lname_ix ON customer(lname);


31 2009 IBM Corporation

Information Management Informix

Simple vs Composite Index


A simple index lists only one column (or for IDS, only one column or function) in its index key specification. Example:
CREATE INDEX cust_lname_ix ON customer(lname);

Any index listing two or more columns is a composite index List the columns in the order from most frequently used to least frequently used Facilitates multiple column joins Increases uniqueness of indexed values Example:
CREATE INDEX ix_items ON items(manu_code, stock_num);

32

2009 IBM Corporation

Information Management Informix

Taking advantage of a Composite Index


On Informix, the optimizer can use a composite index (one that covers more than one column) in several ways: You can use an index on columns a, b, and c (in that order) in these ways:
CREATE INDEX ix_sample ON sample_table (a, b, c);

To locate a particular row using partial-key search:


WHERE WHERE WHERE WHERE a=1 a>=12 AND a<15 a=1 AND b < 5 a=1 AND b = 17 AND c >= 40

The following examples of filters cannot use that composite index:


WHERE b=10 WHERE c=221 WHERE a>=12 AND b=15

To replace a table scan by a key-only search

To join column a, columns ab, or columns abc to another table To implement ORDER BY or GROUP BY on columns a, ab or abc
but not on b, c, ac, or bc
33 2009 IBM Corporation

when all of the desired columns are contained within the index

Information Management Informix

Cluster Indexes (1)


Used to physically order records of the table according to the index, remove extents interleaving making pages in the table contiguous, and avoid sorts Use a cluster index: In static tables, tables subject to few or no modifications (not in dynamic tables) In a frequently read table where an index is used to read many data rows Use CLUSTER to physically reorder the rows of the as the index designates: Informix rewrites the data rows in the table to match the order of the index Therefore, each table can have only ONE cluster index Ex: Create an index on the customer table and physically orders the rows according to their last name values, in (by default) ascending order:
CREATE CLUSTER INDEX ix_cust ON customer(lname);

34

2009 IBM Corporation

Information Management Informix

Cluster Indexes (2)


Over time, Informix does not maintain clustering of the data rows as new rows are inserted or as existing key values are updated: If the table is modified, the benefit of an earlier cluster will disappear Cluster indexes are most effective on relatively static tables
less effective on very dynamic tables because rows are added in space-available order, not sequentially

You can recluster the table to regain performance by issuing another ALTER INDEX TO CLUSTER statement on the clustered index:
ALTER INDEX ix_cust TO CLUSTER;

TO NOT CLUSTER option drops the cluster attribute on the index name without affecting the physical table

35

2009 IBM Corporation

Information Management Informix

Functional Indexes (1)


A functional index is one in which all keys derive from the results of a function The functional index can be a B-tree index, an R-tree index, or a user-defined secondary-access method R-Tree example: If you have a column of pictures, for example, and a function to identify the predominant color, you can create an index on the result of the function
Such an index would enable you to quickly retrieve all pictures having the same predominant color without re-executing the function

36

2009 IBM Corporation

Information Management Informix

Functional Indexes (2)


The function must be a user-defined function (UDF) You cannot create a functional index on any built-in function of SQL But you can create a UDF that calls a built in function and use this UDF as the index key of a functional index B-Tree examples: invalid (upper is an Informix built-in function):
CREATE INDEX ix1 on state (UPPER(sname));

valid (define UDF myupper as not variant):


CREATE FUNCTION myupper (v_value char(15)) RETURNING char(15) with (not variant); define r_value char(15); execute function upper(v_value) into r_value; return r_value; END FUNCTION; create index ix1 on state (myupper (sname));

37

2009 IBM Corporation

Information Management Informix

Index Fill Factor (1)


The DBA can specify the percentage of each page/node that the index will fill during index creation. This is the index fill factor The index fill factor is not maintained over the life of the index. Works only for index build

38

2009 IBM Corporation

Information Management Informix

Index Fill Factor (2)


The Informix onconfig FILLFACTOR parameter sets the system default and is used by all indexes created in the system If the FILLFACTOR is not specified the default is 90 Unless all table indexes receive the same type of activity it is recommended to use the FILLFACTOR option in the CREATE INDEX statement. Ex: CREATE INDEX state_code_idx ON state(code) FILLFACTOR 80; A high fill factor will produce an initially compact/dense index, providing more efficient caching and reducing the number of pages to read when retrieving rows Use a FILLFACTOR of 100 for tables that are receive selects (read only) or deletes to minimize the merging and shuffling pages as keys are removed Creating an index with a lower fill factor will produce a sparse index (with more pages and probably levels to read), which can delay the need for node (page) splitting and the accompanying performance impact
2009 IBM Corporation

39

Information Management Informix

Creating indexes
Examples:
CREATE UNIQUE INDEX ix_orders ON orders(orders_num) IN idx_dbs;
CREATE INDEX ix_items ON items(manu_code, stock_num); CREATE UNIQUE CLUSTER INDEX ix_manufact ON manufact(manu_code) FILLFACTOR 80; CREATE INDEX ix_man_stk ON items(manu_code desc, stock_num); CREATE INDEX order_ix1 ON orders (order_num, order_date desc);

40

2009 IBM Corporation

Information Management Informix

Altering, Dropping, and Renaming Indexes


Examples:
ALTER INDEX ix_man_cd TO CLUSTER; RENAME INDEX ix_cust TO new_ix_cust; DROP INDEX ix_stock;

41

2009 IBM Corporation

Information Management Informix

Index Partitioning (Fragmentation)


Fragmentation is the distribution of data or index from one table across separate dbspaces (logical groups of disk storage devices) In Informix, you can create several partitions of a table/index in the same and/or different dbspaces Each fragment is stored in its own tablespace (group of extents)

42

2009 IBM Corporation

Information Management Informix

Advantages of Fragmentation
Parallel scans and other parallel operations Several disk devices can be read/processed in parallel Balanced I/O Fragment discrimination: We only do I/O in the portion/fragment that contains the data we are asking for Higher availability If a dbspace containing a fragment is unavailable, we may still read the data in the other fragments that are up and available Scalability (ability to grow in size) There is a limit in the maximum amount of data per fragment/dbspace, by fragmenting the table/index, we are giving the chance to grow more

43

2009 IBM Corporation

Information Management Informix

Types of Distribution Schemes


Round Robin: Makes sense in tables, not allowed in index

CREATE TABLE table1( col_1 SERIAL, col_2 CHAR(20)) FRAGMENT BY ROUND ROBIN IN dbspace1,dbspace2;

Expression-based: Makes sense in tables and indexes


CREATE TABLE t1( col_1 SERIAL, col_2 CHAR(20)) FRAGMENT BY EXPRESSION col1 <= 100 IN dbspace1, col1 > 100 AND col1 < 500 IN dbspace2 REMAINDER IN dbspace3;

44

2009 IBM Corporation

Information Management Informix

Fragment by Expression Using Hash Functions


Distribute the data of table1 into 3 fragments using hash function mod:

CREATE TABLE table1( customer_num SERIAL lname CHAR(20)...) FRAGMENT BY EXPRESSION MOD(customer_num, 3) = 0 IN dbspace1, MOD(customer_num, 3) = 1 IN dbspace2, MOD(customer_num, 3) = 2 IN dbspace3;

45

2009 IBM Corporation

Information Management Informix

Fragmented/Partitioned Indexes

46

2009 IBM Corporation

Information Management Informix

Fragmentation CREATE INDEX Statement


By expression CREATE INDEX idx1 ON table1(col_1) FRAGMENT BY EXPRESSION col_1 < 10000 IN dbspace1, col_1 >= 10000 IN dbspace2; No fragmentation scheme is specified CREATE INDEX idx1 ON table1(col_1) IN dbspace1; Partitioned CREATE INDEX idx1 ON table1(col_1) PARTITION BY EXPRESSION PARTITION part1 col_1 < 10000 IN dbspace1, PARTITION part2 col_1 >= 10000 IN dbspace2;

47

2009 IBM Corporation

Information Management Informix

Fragmenting an Index
Discussion: Is this statement valid?
CREATE UNIQUE INDEX ia ON tabl(col1) FRAGMENT BY EXPRESSION col2 <= 10 IN dbsp1, col2 > 10 AND col2 <= 100 IN dbsp2, col2 > 100 IN dbsp3;

48

2009 IBM Corporation

Information Management Informix

Parallel Index Build (Informix)


IDS implements a sophisticated design to enable extremely fast index builds This design divides the index build process into three (3) subtasks, for vertical parallelism: First, scan threads read the data from disk Next, the data is passed to the sort threads Finally, the sorted sets are appended into a single index tree The indexes are built in parallel even if Parallel Data Query (PDQ) PRIORITY is set to 0, but you can tune this environment variable for performance gain

49

2009 IBM Corporation

Information Management Informix

Informix SYSINDEXES System Catalog


The sysindexes table describes each index in the database:
SELECT sysindexes.* FROM sysindexes, systables WHERE tabname = "items" AND systables.tabid = sysindexes.tabid;
Results:
idxname owner tabid idxtype Clustered part1 part2 part3 part4 part16 levels leaves nunique clust 104_10 admin 104 U 1 2 0 0 0 1 1.000000000000 6.000000000000 1.000000000000 <- 1 level <- 1 leaf node / page <- implicit system-generated index, 104_10 <- table id for table items <- unique index, so the index was created to implement a unique constraint <- not clustered

3 row(s) retrieved.

<- items table has 3 indexes defined on it, although we are showing just partial results

50

2009 IBM Corporation

Information Management Informix

Steps to Indexing
Determine if the index will be attached, detached or fragmented If fragmented, determine the expression Determine the FILLFACTOR for each index Determine other characteristics of the index, like: Simple or composite, and default order of the columns Cluster or not clustered Unique or duplicate Calculate the space requirements for the index and temporary dbspace to build the index Determine the optimal locations of the index and temporary dbspaces (defined in DBSPACETEMP) on disk Set the PDQPRIORITY and PSORT_NPROCS environment variables To optimize the parallel index parallel builds Check and optimize configuration parameters if needed B-tree cleaners (threads that automatically clean up deleted keys) Buffers (to cache I/O of data and index pages) Temporary dbspaces (for sorts in index builds) Parallelism (to speed up index builds and other engine operations)
51 2009 IBM Corporation

Information Management Informix

Estimating Extent Size of an Index


It is possible to estimate the size of an index before it is created

For an attached index, the database server uses the ratio of the index key size to the row size to calculate an extent size for the index:
Index extent size = (index_key_size/table_row_size) * table_extent_size

For a detached index, the database server uses the ratio of the index key size (plus some overhead in bytes) to the rowsize to assign an appropriate extent size for the index:
Detached Index extent size = ((index_key_size + 13)/table_row_size) * table_extent_size

After created, run a query on the system catalog tables or run oncheck pe or oncheck -pT mydb:mytab to get the size of indexes
2009 IBM Corporation

52

Information Management Informix

Benefits of Indexing
Use filtering to reduce the number of pages read (I/O) Eliminate sorts implementations (as temporary tables) Ensure uniqueness of key values (constraints) Reduce the number of pages read (key-only reads)

53

2009 IBM Corporation

Information Management Informix

Costs of Indexing

Disk Space Costs

Processing and Maintenance Costs

54

2009 IBM Corporation

Information Management Informix

Summary Benefits and Costs of Indexing


Benefits
Faster access
Minimizes I/O, avoids implementing sorts

Enforce constraints and business rules


Primary key, foreign keys, unique values

Costs (Overhead)
Space cost: Require disk space
There are ways to estimate the size an index will have

Time cost for maintenance: Insert, update, delete operations


Both the data and indexes need to be updated

55

2009 IBM Corporation

Information Management Informix

Indexing Guidelines and Best Practices (1)


Create an index on: Join columns (used in WHERE clause for multi-table joins) Highly-selective filter columns (columns used in WHERE clauses) Columns frequently used for ordering and grouping (sorting, columns used in ORDER BY and GROUP BY clauses) Also columns used in UNIQUE / DISTINCT clauses And columns used in UNION statements (where we combine queries) Avoid highly duplicate indexes Prefer indexes on high-selectivity columns You can use composite indexes to increase index selectivity Limit index in highly-updated table/columns and volatile/temp tables Remember overhead cost of index maintenance (btree cleaner) Limit number of indexes on tables, use only the needed Keep key size small Long character strings are not good candidates for indexes Try to use small columns, like smallint, integer, date, char(<small n>) Keep the number of indexes small Create only the indexes you really need
56 2009 IBM Corporation

Information Management Informix

Indexing Guidelines and Best Practices (2)


Index only if you need to access <= 4 or 5 % of the data in a table The alternative to using an index to access row data in a table is to read the entire table sequentially from top to bottom (sequential scan or full table scan) Sequential scans are better for queries that require a high percentage of the data in a table Remember: Using indexes to retrieve rows requires two reads: an index read followed by a table/data read Avoid indexes on relatively small tables Sequential table scans are just fine for small tables There is no need to store both table and index data for small tables Create primary keys (or, even better, explicit unique index followed by a primary key constraint) for all tables, as possible Remember: Even if you dont explicitly create an index for a primary, foreign key and unique constraint, Informix will implicitly create an index for you Need a functional index?
57 2009 IBM Corporation

Information Management Informix

Indexing Guidelines and Best Practices (3)


Use composite indexes to increase uniqueness Composite indexes may need to be used where single-column values may not be unique by themselves Remember: In composite indexes, the driving is the first column, and this should be the most selective column in the index Use clustered indexes to speed up retrieval Remember: Only a single cluster index is allowed per table, and will reorder the data rows according to the index. See guidelines for cluster Disable indexes before large DML operations Massive loads, inserts, deletes, updates You can re-enable the indexes and constraints after the large DML operation is finished Remember: The index on a table should be based on the types of queries you expect to occur against the table's columns Always give priority to the most frequently executed queries More indexes than the ones you need will produce the cost-based query Optimizer to do additional work to decide which index to use
58 2009 IBM Corporation

Information Management Informix

Troubleshooting
When an Index should be used but it is not:

To see if an index is used in a query plan:


On Informix: use SET EXPLAIN or visual explain, analyze query plan and cost

Check and repair possible Index Corruption


On Informix: oncheck cI utility (-c: check, -I: indexes)

Update Optimizers statistics


On Informix: update statistics command

Check if the index is enabled or disabled


If disabled, enable it: set constraints/indexes enabled

Check if this is a redundant index with another


Remove unnecessary indexes, re-run statistics as needed

Check the same query on a newer version of the DBMS engine


If the index is used and the query runs faster, this might mean a defect in the previous release of the product that is fixed in the newest one

Try forcing the index using an Optimizer Directive


If times and costs are better using the index that was forced, this might be a defect on the DBMS. Collect all the information and call Tech Support

59

2009 IBM Corporation

Information Management Informix

References
IBM Informix Dynamic Server (IDS) 11.50 Information Center (Manuals online) http://publib.boulder.ibm.com/infocenter/idshelp/v115/index.jsp IBM IDS manuals available in PDF format http://www-01.ibm.com/software/data/informix/pubs/library/ All IBM Informix technical publications and link to Support site http://www-01.ibm.com/software/data/informix/pubs/ http://www-01.ibm.com/support/docview.wss?uid=swg27013894 Look for: Getting Started Guide, Design and Implementation Guide, SQL Tutorial, Performance Guide and Administrators Guide

Old but good article on fragmentation: Divide and conquer: Using fragmentation for smart data access http://www.ibm.com/developerworks/db2/zones/informix/library/techarticle/0 206metzger/0206metzger1.html

60

2009 IBM Corporation