Indexing

RDBMS Fundamentals: Indexing
Relational data structures: Indexes (examples on Informix Dynamic Server IDS)
May 16, 2012
2009 IBM Corporation
Information Management Informix
Objectives
At the end of this training session, you will be able to: Understand the benefits and costs of indexes for the databases Understand how indexes are implemented Identify the different index structures Understand the B+ tree Understand features and guidelines to use: Implicit versus Explicit indexes Unique versus Duplicate indexes Simple versus Composite indexes Cluster indexes Functional indexes Understand decision criteria for Index storage: Attached versus Detached indexes Index fragmentation Select an appropriate fill factor for an index Know how to alter, drop, rename and maintain an index Understand how Indexing works on IBM Informix Dynamic Server (IDS)
Sequential Scans
A Sequential Scan or Table Scan: Reads all the pages that belong to the table, returns all the rows of the table Starts with the first page of the table, and orderly travels across all the devices that contain pages of the table until retrieve the last page Sequential scan of a large table is an expensive operation in OLTP systems
Small sequential scans (seq scans in small tables) are acceptable in OLTP
I/O is expensive (mechanic) operation Also fills up the memory buffers with unnecessary pages, affecting the buffer profiling and the performance of the database server
When do Sequential Scans work well?

In any case of access (random or sequential) on very small tables The table fits in just a few pages and can be accessed with minimum I/O Reporting-type of queries, where all the rows of the table are needed Random-access is not an option because the whole table is needed to process the results Result set is large enough or non-selective enough, so scanning the whole table is cheaper than randomly access every row
Non-restrictive filters using low cardinality columns or low selectivity filters Example: Queries using aggregate functions (max, min, avg, sum) and no filters
Sequential Scans The need for indexes (1)

Imagine this scenario: Filter rows in a table based on a condition Lookup case, search for rows meeting a restrictive condition (filter) Just a few rows meet the condition (e.g. <= 5% of total data) The filter will retrieve scattered rows, not placed together in disk We should NOT use a sequential scan to do a random or nonsequential access with high-selectivity filters The bad performance will be more evident as we use a more restrictive / selective filter (for instance, on columns with high-cardinality)
Sequential Scans The need for indexes (2)

Imagine this scenario: Join between two tables, at least one is large The tables are related in just a few rows (equi-join) The iterations in a nested loop join scenario will severely decrease the performance The # of access reads will be multiplied by the rows sequentially scanned The performance depends on the order taken for the tables in the join and whether or not an index will be used in any of the tables
Indexes The Basics (1)

An index is a structure or object in the database
Database indexes are similar to indexes in books or file cabinets

Indexes provide fast access to rows in tables meeting certain condition in a query Minimize I/O, improving performance, specially in random access It is a dynamic structure: Changes as the data in the table changes A table can have several indexes, to satisfy several queries
Indexes The Basics (2)

Unique indexes are necessary on column(s) that must be unique Presence of an index can allow the optimizer to speed up a query The database optimizer decides whether to use an index or not We can force the optimizer to use or avoid an index The optimizer can use an index in the following ways: To replace sequential scans of a table with nonsequential/random access To avoid reading row data when processing expressions that name only indexed columns To avoid a sort (including building a temporary table) when executing the GROUP BY and ORDER BY clauses
An index on the appropriate column can save thousands, tens of thousands, or in extreme cases, even millions of disk operations during a query
However, indexes entail costs (in space, processing and maintenance)
Query Speed Comparison: Seq scan vs Index scan

Example: Suppose you have two tables:
tab1: with 200 rows (small table), and tab2: with 500,000 rows (large table) Both with a unique index on the joining column Assume 1 row per page, therefore 1 data read (I/O) needed per row For tab1: Assume it takes 2 index reads (+ 1 data read per row) => 3 I/Os For tab2: Assume it takes 3 index reads (+ 1 data read per row) => 4 I/Os
A simple select to find related rows from the two tables is:
SELECT * FROM tab1, tab2 where tab1.col1=tab2.col2

Results: Depending on the database optimizers decision How many I/O (disk) accesses (index + data reads) needed?
If select from the tab1 first and then joins to tab2 using the index: If selects from tab2 first, and then joins to tab1 using the index: If no indexes used at all: Sequential scan in both tables
2 million disk reads! = 500,000 for tab2 + 500,000x3 for tab1 around 1 billion disk reads! = 200 for tab1 x 500,000 for tab2 1,000 disk reads = 200 for tab1 + 200x4 for tab2
Typical Index Structures

Binary Search Tree (BST) B-Tree B+ Tree: DB Worlds favorite and most widely used index Hash Bitmap R-Tree
10
B-Tree (1) Analogy with a File Cabinet

File Cabinet: Organized storage B-Tree index
11
B-Tree (2)
Allows several keys in one index page / node Keeps similar valued records together on a disk page This takes advantage of locality of reference over binary search trees (BST) Allows nodes to be incompletely filled Every node in the tree will be full at least to a certain minimum percentage Some space is wasted (nodes are not entirely full) Requires less re-balancing than binary search trees (BST) Reduces the number of disk fetches necessary during a search They are always in perfect balance, as all leaf nodes are at the same depth Keeps data sorted and allows searches, insertions, and deletions in logarithmic amortized time Unlike self-balancing BST, it is optimized for systems that read and write large blocks of data Grows from the bottom: When a node is over-full, it is split and the added node is put one level up. Deletions are the reverse of additions
12 2009 IBM Corporation
B-Tree (3)
In B-trees: The root node points down to branch nodes Branch nodes point down to leaf nodes Leaf nodes address the actual data rows But: B-tree has a problem It is inefficient to search for a range of data values Not able to be used to move laterally through the tree, as well as up and down
If all adjacent sibling nodes were connected, you could scan an entire tree level with minimum effort. This is done by B+ trees, used by some of the major database vendors
13
B-plus Tree (B+ tree) (1)

It is an improved B-tree. The preferred index structure Widely used in databases. Used in Informix Levels: The topmost level of the index structure hierarchy contains a single root page Each branch page has entries that point to pages in the next level of the index and to the immediate peer nodes (next nodes at the same level) Each leaf page contains a list of index entries that point to rows in the table Index entries in leaf pages/nodes are sorted in key-value order An Index entry consists of a key and one or more row pointers The key is a copy of the indexed columns from one row of data A row pointer provides an address used to locate a row that contains the key (rowid) A sample b+ tree where only forward scan is available
14
B-plus Tree (B+ tree) (2) Informix

In Informix: B+ tree index structures An index leaf is conformed of:
The Index root and branch nodes:

Similar to the leaves, but instead of rowids, they have the IDs of other index nodes (pointers to other nodes)
A key value (the index key) Row IDs to find the data row(s) A delete flag
Each Index node/page is doubledlinked with its previous and next index peer node, to optimize:
Key-only and Range searches Both Forward and Reverse Order searches
An index key contains the value of the column we want to search for
e.g.: first_name, SSN, zip_code
15
B+Tree (3) Informix

In B+ trees: The root node points down to branch nodes Branch nodes point down, left and right Leaf nodes point left, right, and down to data via rowids The ability to move right or left from a node to its adjacent node puts the plus in B+ tree All IBM Informix indexes for non-multidimensional data are B+ trees
16
B-plus Tree (B+ tree) (4) Informix
Index Set
Sequence Set
In Informix: Forward (Ascending) and Reversed-order (Descending) Index scan available using one single index. Examples (second one is a Key-Only search):
Select fname, lname from customer order by customer_num; Select customer_num from customer order by customer_num desc;
Range searches get improved with this structure too. Ex:
Select customer_num, fname, lname from customer where customer_num between 50 and 300; Select customer_num from customer where customer_num >= 50 and customer_num <= 300;
17
B+ Tree Splits
18
R-Tree (1)
An R-tree index is a secondary data structure (or access method) that organizes access to data similar to a B-tree index. Used in Informix, for multi-dimensional and spatial data R-tree is specifically designed to index table columns that contain the following types of data: Multidimensional data:
Spatial data in two or three dimensions
Combinations of numerical values treated as multidimensional values:
An extra dimension that represents time could also be included.
Range values:
Such as a configuration for a house that includes the number of stories, the number of bedrooms, the number of baths, the age of the house, and the sqf
Such as the time of a television program (9:00 P.M. to 9:30 P.M.) (X, Y) coordinates of geographical data.
A common real-world usage for an R-tree might be: "Find all museums within 2 miles (3.2 km) of my current location".
R-Tree (2)
Simple example of an R-tree for 2D rectangles
20
Reading through a B+ tree Index (Index scan)

When you access a row through an index: You read the B+ tree starting at the root node and follow the nodes down to the lowest level, which contains the pointer to the data In the example, we need 3 index read operations to find the pointer to the data Keep key size to a minimum for two reasons: To allow a single index page in memory to hold more key values To have fewer B+ tree levels in the index, very important for performance: For Informix (IDS), the size of a node is the size of one page
reduces the number of read operations necessary to look up several rows An index with a 4-level tree needs 1 more read per row than an index with a 3-level tree If 100,000 rows read in 1 hour -> 100,000 fewer reads needed to obtain the same data
3 level b+ tree. # of page reads needed to find a row:
If key-only scan (no need to get the row data page):

3 (index-only) If we need to get the data pages: 4 (3 index + 1 data)
21
Placement of an Index Review of Informix storage

Physical Storage Units: Chunk Largest unit of physical disk dedicated to Informix DB server data storage Provide DBAs with a significantly large unit for allocating disk space Max size of individual chunk is 4TB Number of allowable chunks is 32,766 It is a disk unit: Contains a certain number of pages Pages from different tables/indexes can be allocated in the same chunk Page Minimum I/O unit Commonly 2KB or 4KB, configurable up to 16K
Raw device (disk partition), or Regular file system file (cooked file)

Physical Storage Units (cont): Extent Group of contiguous pages within a chunk that store data for a given table, index, table fragment or index fragment When creating a table, you specify its initial extent size and next extent size As the table grows, it can have multiple extents Important to manage appropriate extent sizes and number of extents Default size configurable in Informix configuration file (ONCONFIG) An extent contains pages of a single table, index or fragment
If not set, default is 16k
23

Logical Storage Units: Tblspace Logical collection of all the extents (not necessarily contiguous) allocated to a specific table, index or fragment It can include extents stored on a single chunk or on multiple chunks A tblspace, however, is always contained within a single dbspace Dbspace Logical collection of chunks Form a pool of disk space that is used to store DBs, tables, indexes or fragments A single dbspace can contain pages of different tables/indexes Special purpose dbspaces:
Root dbspace: stores systems catalog information and databases Blobspace: stores simple binary large objects Sbspace: stores smart binary large objects Temporary dbspaces: for non-logged temp data
24

CREATE DATABASE stores_demo WITH LOG IN dbs1;
CREATE TABLE customer (customer_num integer,) IN dbs2;

CREATE INDEX ix_cust ON customer (customer_num) IN dbs3;
You can place different database objects in different disk spaces: To balance I/O work across the disk devices and controllers available Example: The database stores_demo in a data dbspace dbs1 The table customer in a data dbspace dbs2 The index ix_cust in a data dbspace dbs3
If no dbspace is specified, the database is created in rootdbs (the Root dbspace) If no dbspace is specified, the table is created in a new tblspace where its database resides If no dbspace is specified, the index is created in a new tblspace in the same dbspace as its table When an index is created on an empty table, just the root node of the B-tree is created
25
Attached vs Detached Indexes (1)

In the past, indexes used to be attached to the table meaning they did not have a separate tblspace, and the index pages were interleaving with the data pages of the table in the tables tblspace Now, on Informix, all indexes are detached meaning index extents are stored separately (in a new tblspace) from table extents, even if they are placed within the same dbspace as the table
26
Attached vs Detached Indexes (2)

An index can be placed in a separate dbspace. Example, this index is stored in a separate dbspace called cust_ix_dbs: CREATE INDEX customer_ix ON customer (zipcode) IN cust_ix_dbs;
By default, index extents are created in the dbspace that holds the data (table) extents
A detached index can have a fragmentation strategy different from the one used by its table, that you set up explicitly with CREATE INDEX
27
Bidirectional Traversal of Indexes (1)

ASC and DESC keywords specify the order to maintain the index
When creating an index on a column, if you omit or specify the ASC keyword, Informix stores the key values in ascending order Default order for a column From the smallest to the largest key Example on customers last name: Albertson, Beatty, Currie
create index ix_cust on customer(lname asc); or create index ix_cust on customer(lname); Use DESC keyword for Informix to store the key values in descending order From the largest to the smallest key Example on customers last names: Currie, Beatty, Albertson create index ix_cust on customer(lname desc);
Bidirectional Traversal of Indexes (2)

Informixs bidirectional traversal capability of the database server lets you create just one index on a column and use that index for queries that specify sorting of results in either ascending or descending order of the sort column
29
Implicit vs Explicit Index

Implicit indexes are created when a constraint (primary key, foreign key, unique constraint) is defined that cannot use an existing index You cannot specify a dbspace location, fragmentation strategy or fill factor for the index Implicit indexes are created in the same dbspace as the database
CREATE TABLE tab1 ( col1 INTEGER, col2 INTEGER, col3 CHAR(25), PRIMARY KEY (col1)) IN table1dbs;
Explicit indexes are created using CREATE INDEX statement It is recommended to explicitly create indexes that exactly match the referential constraint and then use ALTER TABLE to add the constraint
The constraint will use the existing index instead of implicitly creating one
CREATE TABLE tab1 (col1 INTEGER, col2 INTEGER, col3 CHAR (25)) IN table1dbs; CREATE INDEX index1 ON TABLE table_name(col1) IN idx1dbs FILLFACTOR 70; ALTER TABLE tab1 ADD CONSTRAINT PRIMARY KEY (col1);
30
Unique vs Duplicate Index

Unique indexes allow only one occurrence of a value in the indexed column Created for columns whose values cannot be repeated within the table Used to enforce primary key (PK)s uniqueness or unique constraints An index entry is created for each row in the table, prevents duplicates Ex: customer_num, SSN, employee_id A non-unique, duplicate or secondary index is an index based in a non-key attribute, and allows identical values for multiple rows in an indexed column Avoid having highly duplicated indexes Ex: city, first_name, last_name, order_date, zipcode
CREATE UNIQUE INDEX cust_num_ix ON customer(customer_num); or CREATE DISTINCT INDEX cust_num_ix ON customer(customer_num);
Indexes become less effective as they are less unique
CREATE INDEX cust_lname_ix ON customer(lname);

Simple vs Composite Index

A simple index lists only one column (or for IDS, only one column or function) in its index key specification. Example:
CREATE INDEX cust_lname_ix ON customer(lname);
Any index listing two or more columns is a composite index List the columns in the order from most frequently used to least frequently used Facilitates multiple column joins Increases uniqueness of indexed values Example:
CREATE INDEX ix_items ON items(manu_code, stock_num);
32
Taking advantage of a Composite Index

On Informix, the optimizer can use a composite index (one that covers more than one column) in several ways: You can use an index on columns a, b, and c (in that order) in these ways:
CREATE INDEX ix_sample ON sample_table (a, b, c);
To locate a particular row using partial-key search:

WHERE WHERE WHERE WHERE a=1 a>=12 AND a<15 a=1 AND b < 5 a=1 AND b = 17 AND c >= 40
The following examples of filters cannot use that composite index:

WHERE b=10 WHERE c=221 WHERE a>=12 AND b=15
To replace a table scan by a key-only search
To join column a, columns ab, or columns abc to another table To implement ORDER BY or GROUP BY on columns a, ab or abc
but not on b, c, ac, or bc
when all of the desired columns are contained within the index
Cluster Indexes (1)

Used to physically order records of the table according to the index, remove extents interleaving making pages in the table contiguous, and avoid sorts Use a cluster index: In static tables, tables subject to few or no modifications (not in dynamic tables) In a frequently read table where an index is used to read many data rows Use CLUSTER to physically reorder the rows of the as the index designates: Informix rewrites the data rows in the table to match the order of the index Therefore, each table can have only ONE cluster index Ex: Create an index on the customer table and physically orders the rows according to their last name values, in (by default) ascending order:
CREATE CLUSTER INDEX ix_cust ON customer(lname);
34
Cluster Indexes (2)

Over time, Informix does not maintain clustering of the data rows as new rows are inserted or as existing key values are updated: If the table is modified, the benefit of an earlier cluster will disappear Cluster indexes are most effective on relatively static tables
less effective on very dynamic tables because rows are added in space-available order, not sequentially
You can recluster the table to regain performance by issuing another ALTER INDEX TO CLUSTER statement on the clustered index:
ALTER INDEX ix_cust TO CLUSTER;
TO NOT CLUSTER option drops the cluster attribute on the index name without affecting the physical table
35
Functional Indexes (1)

A functional index is one in which all keys derive from the results of a function The functional index can be a B-tree index, an R-tree index, or a user-defined secondary-access method R-Tree example: If you have a column of pictures, for example, and a function to identify the predominant color, you can create an index on the result of the function
Such an index would enable you to quickly retrieve all pictures having the same predominant color without re-executing the function
36
Functional Indexes (2)

The function must be a user-defined function (UDF) You cannot create a functional index on any built-in function of SQL But you can create a UDF that calls a built in function and use this UDF as the index key of a functional index B-Tree examples: invalid (upper is an Informix built-in function):
CREATE INDEX ix1 on state (UPPER(sname));
valid (define UDF myupper as not variant):

CREATE FUNCTION myupper (v_value char(15)) RETURNING char(15) with (not variant); define r_value char(15); execute function upper(v_value) into r_value; return r_value; END FUNCTION; create index ix1 on state (myupper (sname));
37
Index Fill Factor (1)

The DBA can specify the percentage of each page/node that the index will fill during index creation. This is the index fill factor The index fill factor is not maintained over the life of the index. Works only for index build
38
Index Fill Factor (2)

The Informix onconfig FILLFACTOR parameter sets the system default and is used by all indexes created in the system If the FILLFACTOR is not specified the default is 90 Unless all table indexes receive the same type of activity it is recommended to use the FILLFACTOR option in the CREATE INDEX statement. Ex: CREATE INDEX state_code_idx ON state(code) FILLFACTOR 80; A high fill factor will produce an initially compact/dense index, providing more efficient caching and reducing the number of pages to read when retrieving rows Use a FILLFACTOR of 100 for tables that are receive selects (read only) or deletes to minimize the merging and shuffling pages as keys are removed Creating an index with a lower fill factor will produce a sparse index (with more pages and probably levels to read), which can delay the need for node (page) splitting and the accompanying performance impact
39
Creating indexes
Examples:
CREATE UNIQUE INDEX ix_orders ON orders(orders_num) IN idx_dbs;
CREATE INDEX ix_items ON items(manu_code, stock_num); CREATE UNIQUE CLUSTER INDEX ix_manufact ON manufact(manu_code) FILLFACTOR 80; CREATE INDEX ix_man_stk ON items(manu_code desc, stock_num); CREATE INDEX order_ix1 ON orders (order_num, order_date desc);
40
Altering, Dropping, and Renaming Indexes

Examples:
ALTER INDEX ix_man_cd TO CLUSTER; RENAME INDEX ix_cust TO new_ix_cust; DROP INDEX ix_stock;
41
Index Partitioning (Fragmentation)

Fragmentation is the distribution of data or index from one table across separate dbspaces (logical groups of disk storage devices) In Informix, you can create several partitions of a table/index in the same and/or different dbspaces Each fragment is stored in its own tablespace (group of extents)
42
Advantages of Fragmentation
Parallel scans and other parallel operations Several disk devices can be read/processed in parallel Balanced I/O Fragment discrimination: We only do I/O in the portion/fragment that contains the data we are asking for Higher availability If a dbspace containing a fragment is unavailable, we may still read the data in the other fragments that are up and available Scalability (ability to grow in size) There is a limit in the maximum amount of data per fragment/dbspace, by fragmenting the table/index, we are giving the chance to grow more
43
Types of Distribution Schemes

Round Robin: Makes sense in tables, not allowed in index
CREATE TABLE table1( col_1 SERIAL, col_2 CHAR(20)) FRAGMENT BY ROUND ROBIN IN dbspace1,dbspace2;
Expression-based: Makes sense in tables and indexes

CREATE TABLE t1( col_1 SERIAL, col_2 CHAR(20)) FRAGMENT BY EXPRESSION col1 <= 100 IN dbspace1, col1 > 100 AND col1 < 500 IN dbspace2 REMAINDER IN dbspace3;
44
Fragment by Expression Using Hash Functions

Distribute the data of table1 into 3 fragments using hash function mod:
CREATE TABLE table1( customer_num SERIAL lname CHAR(20)...) FRAGMENT BY EXPRESSION MOD(customer_num, 3) = 0 IN dbspace1, MOD(customer_num, 3) = 1 IN dbspace2, MOD(customer_num, 3) = 2 IN dbspace3;
45
Fragmented/Partitioned Indexes
46
Fragmentation CREATE INDEX Statement

By expression CREATE INDEX idx1 ON table1(col_1) FRAGMENT BY EXPRESSION col_1 < 10000 IN dbspace1, col_1 >= 10000 IN dbspace2; No fragmentation scheme is specified CREATE INDEX idx1 ON table1(col_1) IN dbspace1; Partitioned CREATE INDEX idx1 ON table1(col_1) PARTITION BY EXPRESSION PARTITION part1 col_1 < 10000 IN dbspace1, PARTITION part2 col_1 >= 10000 IN dbspace2;
47
Fragmenting an Index
Discussion: Is this statement valid?
CREATE UNIQUE INDEX ia ON tabl(col1) FRAGMENT BY EXPRESSION col2 <= 10 IN dbsp1, col2 > 10 AND col2 <= 100 IN dbsp2, col2 > 100 IN dbsp3;
48
Parallel Index Build (Informix)

IDS implements a sophisticated design to enable extremely fast index builds This design divides the index build process into three (3) subtasks, for vertical parallelism: First, scan threads read the data from disk Next, the data is passed to the sort threads Finally, the sorted sets are appended into a single index tree The indexes are built in parallel even if Parallel Data Query (PDQ) PRIORITY is set to 0, but you can tune this environment variable for performance gain
49
Informix SYSINDEXES System Catalog

The sysindexes table describes each index in the database:
SELECT sysindexes.* FROM sysindexes, systables WHERE tabname = "items" AND systables.tabid = sysindexes.tabid;
Results:
idxname owner tabid idxtype Clustered part1 part2 part3 part4 part16 levels leaves nunique clust 104_10 admin 104 U 1 2 0 0 0 1 1.000000000000 6.000000000000 1.000000000000 <- 1 level <- 1 leaf node / page <- implicit system-generated index, 104_10 <- table id for table items <- unique index, so the index was created to implement a unique constraint <- not clustered
3 row(s) retrieved.
<- items table has 3 indexes defined on it, although we are showing just partial results
50
Steps to Indexing
Determine if the index will be attached, detached or fragmented If fragmented, determine the expression Determine the FILLFACTOR for each index Determine other characteristics of the index, like: Simple or composite, and default order of the columns Cluster or not clustered Unique or duplicate Calculate the space requirements for the index and temporary dbspace to build the index Determine the optimal locations of the index and temporary dbspaces (defined in DBSPACETEMP) on disk Set the PDQPRIORITY and PSORT_NPROCS environment variables To optimize the parallel index parallel builds Check and optimize configuration parameters if needed B-tree cleaners (threads that automatically clean up deleted keys) Buffers (to cache I/O of data and index pages) Temporary dbspaces (for sorts in index builds) Parallelism (to speed up index builds and other engine operations)
Estimating Extent Size of an Index

It is possible to estimate the size of an index before it is created
For an attached index, the database server uses the ratio of the index key size to the row size to calculate an extent size for the index:
Index extent size = (index_key_size/table_row_size) * table_extent_size
For a detached index, the database server uses the ratio of the index key size (plus some overhead in bytes) to the rowsize to assign an appropriate extent size for the index:
Detached Index extent size = ((index_key_size + 13)/table_row_size) * table_extent_size
After created, run a query on the system catalog tables or run oncheck pe or oncheck -pT mydb:mytab to get the size of indexes
52
Benefits of Indexing
Use filtering to reduce the number of pages read (I/O) Eliminate sorts implementations (as temporary tables) Ensure uniqueness of key values (constraints) Reduce the number of pages read (key-only reads)
53
Costs of Indexing
Disk Space Costs
Processing and Maintenance Costs
54
Summary Benefits and Costs of Indexing

Benefits
Faster access
Minimizes I/O, avoids implementing sorts
Enforce constraints and business rules

Primary key, foreign keys, unique values
Costs (Overhead)
Space cost: Require disk space
There are ways to estimate the size an index will have
Time cost for maintenance: Insert, update, delete operations

Both the data and indexes need to be updated
55
Indexing Guidelines and Best Practices (1)

Create an index on: Join columns (used in WHERE clause for multi-table joins) Highly-selective filter columns (columns used in WHERE clauses) Columns frequently used for ordering and grouping (sorting, columns used in ORDER BY and GROUP BY clauses) Also columns used in UNIQUE / DISTINCT clauses And columns used in UNION statements (where we combine queries) Avoid highly duplicate indexes Prefer indexes on high-selectivity columns You can use composite indexes to increase index selectivity Limit index in highly-updated table/columns and volatile/temp tables Remember overhead cost of index maintenance (btree cleaner) Limit number of indexes on tables, use only the needed Keep key size small Long character strings are not good candidates for indexes Try to use small columns, like smallint, integer, date, char(<small n>) Keep the number of indexes small Create only the indexes you really need

Index only if you need to access <= 4 or 5 % of the data in a table The alternative to using an index to access row data in a table is to read the entire table sequentially from top to bottom (sequential scan or full table scan) Sequential scans are better for queries that require a high percentage of the data in a table Remember: Using indexes to retrieve rows requires two reads: an index read followed by a table/data read Avoid indexes on relatively small tables Sequential table scans are just fine for small tables There is no need to store both table and index data for small tables Create primary keys (or, even better, explicit unique index followed by a primary key constraint) for all tables, as possible Remember: Even if you dont explicitly create an index for a primary, foreign key and unique constraint, Informix will implicitly create an index for you Need a functional index?

Use composite indexes to increase uniqueness Composite indexes may need to be used where single-column values may not be unique by themselves Remember: In composite indexes, the driving is the first column, and this should be the most selective column in the index Use clustered indexes to speed up retrieval Remember: Only a single cluster index is allowed per table, and will reorder the data rows according to the index. See guidelines for cluster Disable indexes before large DML operations Massive loads, inserts, deletes, updates You can re-enable the indexes and constraints after the large DML operation is finished Remember: The index on a table should be based on the types of queries you expect to occur against the table's columns Always give priority to the most frequently executed queries More indexes than the ones you need will produce the cost-based query Optimizer to do additional work to decide which index to use
Troubleshooting
When an Index should be used but it is not:
To see if an index is used in a query plan:

On Informix: use SET EXPLAIN or visual explain, analyze query plan and cost
Check and repair possible Index Corruption

On Informix: oncheck cI utility (-c: check, -I: indexes)
Update Optimizers statistics

On Informix: update statistics command
Check if the index is enabled or disabled

If disabled, enable it: set constraints/indexes enabled
Check if this is a redundant index with another

Remove unnecessary indexes, re-run statistics as needed
Check the same query on a newer version of the DBMS engine

If the index is used and the query runs faster, this might mean a defect in the previous release of the product that is fixed in the newest one
Try forcing the index using an Optimizer Directive

If times and costs are better using the index that was forced, this might be a defect on the DBMS. Collect all the information and call Tech Support
59
References
IBM Informix Dynamic Server (IDS) 11.50 Information Center (Manuals online) http://publib.boulder.ibm.com/infocenter/idshelp/v115/index.jsp IBM IDS manuals available in PDF format http://www-01.ibm.com/software/data/informix/pubs/library/ All IBM Informix technical publications and link to Support site http://www-01.ibm.com/software/data/informix/pubs/ http://www-01.ibm.com/support/docview.wss?uid=swg27013894 Look for: Getting Started Guide, Design and Implementation Guide, SQL Tutorial, Performance Guide and Administrators Guide
Old but good article on fragmentation: Divide and conquer: Using fragmentation for smart data access http://www.ibm.com/developerworks/db2/zones/informix/library/techarticle/0 206metzger/0206metzger1.html
60

Indexing

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Indexing

Uploaded by

Copyright:

Available Formats

RDBMS Fundamentals: Indexing

Relational data structures: Indexes (examples on Informix Dynamic Server IDS)

May 16, 2012

2009 IBM Corporation

Information Management Informix

2009 IBM Corporation

Information Management Informix

2009 IBM Corporation

Information Management Informix

When do Sequential Scans work well?

2009 IBM Corporation

Information Management Informix

Sequential Scans The need for indexes (1)

2009 IBM Corporation

Information Management Informix

Sequential Scans The need for indexes (2)

2009 IBM Corporation

Information Management Informix

Indexes The Basics (1)

Database indexes are similar to indexes in books or file cabinets

2009 IBM Corporation

Information Management Informix

Indexes The Basics (2)

2009 IBM Corporation

Information Management Informix

Query Speed Comparison: Seq scan vs Index scan

SELECT * FROM tab1, tab2 where tab1.col1=tab2.col2

2009 IBM Corporation

Information Management Informix

Typical Index Structures

2009 IBM Corporation

Information Management Informix

B-Tree (1) Analogy with a File Cabinet

2009 IBM Corporation

Information Management Informix

Information Management Informix

2009 IBM Corporation

Information Management Informix

B-plus Tree (B+ tree) (1)

2009 IBM Corporation

Information Management Informix

B-plus Tree (B+ tree) (2) Informix

The Index root and branch nodes:

Information Management Informix

B+Tree (3) Informix

2009 IBM Corporation

Information Management Informix

B-plus Tree (B+ tree) (4) Informix

Range searches get improved with this structure too. Ex:

Information Management Informix

2009 IBM Corporation

Information Management Informix

Combinations of numerical values treated as multidimensional values:

An extra dimension that represents time could also be included.

Information Management Informix

2009 IBM Corporation

Information Management Informix

Reading through a B+ tree Index (Index scan)

3 level b+ tree. # of page reads needed to find a row:

If key-only scan (no need to get the row data page):

2009 IBM Corporation

Information Management Informix

Placement of an Index Review of Informix storage