Professional Documents
Culture Documents
Objectives
At the end of this training session, you will be able to: Understand the benefits and costs of indexes for the databases Understand how indexes are implemented Identify the different index structures Understand the B+ tree Understand features and guidelines to use: Implicit versus Explicit indexes Unique versus Duplicate indexes Simple versus Composite indexes Cluster indexes Functional indexes Understand decision criteria for Index storage: Attached versus Detached indexes Index fragmentation Select an appropriate fill factor for an index Know how to alter, drop, rename and maintain an index Understand how Indexing works on IBM Informix Dynamic Server (IDS)
Sequential Scans
A Sequential Scan or Table Scan: Reads all the pages that belong to the table, returns all the rows of the table Starts with the first page of the table, and orderly travels across all the devices that contain pages of the table until retrieve the last page Sequential scan of a large table is an expensive operation in OLTP systems
Small sequential scans (seq scans in small tables) are acceptable in OLTP
I/O is expensive (mechanic) operation Also fills up the memory buffers with unnecessary pages, affecting the buffer profiling and the performance of the database server
An index on the appropriate column can save thousands, tens of thousands, or in extreme cases, even millions of disk operations during a query
However, indexes entail costs (in space, processing and maintenance)
A simple select to find related rows from the two tables is:
10
11
B-Tree (2)
Allows several keys in one index page / node Keeps similar valued records together on a disk page This takes advantage of locality of reference over binary search trees (BST) Allows nodes to be incompletely filled Every node in the tree will be full at least to a certain minimum percentage Some space is wasted (nodes are not entirely full) Requires less re-balancing than binary search trees (BST) Reduces the number of disk fetches necessary during a search They are always in perfect balance, as all leaf nodes are at the same depth Keeps data sorted and allows searches, insertions, and deletions in logarithmic amortized time Unlike self-balancing BST, it is optimized for systems that read and write large blocks of data Grows from the bottom: When a node is over-full, it is split and the added node is put one level up. Deletions are the reverse of additions
12 2009 IBM Corporation
B-Tree (3)
In B-trees: The root node points down to branch nodes Branch nodes point down to leaf nodes Leaf nodes address the actual data rows But: B-tree has a problem It is inefficient to search for a range of data values Not able to be used to move laterally through the tree, as well as up and down
If all adjacent sibling nodes were connected, you could scan an entire tree level with minimum effort. This is done by B+ trees, used by some of the major database vendors
13
14
A key value (the index key) Row IDs to find the data row(s) A delete flag
Each Index node/page is doubledlinked with its previous and next index peer node, to optimize:
Key-only and Range searches Both Forward and Reverse Order searches
An index key contains the value of the column we want to search for
e.g.: first_name, SSN, zip_code
2009 IBM Corporation
15
16
Index Set
Sequence Set
In Informix: Forward (Ascending) and Reversed-order (Descending) Index scan available using one single index. Examples (second one is a Key-Only search):
Select fname, lname from customer order by customer_num; Select customer_num from customer order by customer_num desc;
Select customer_num, fname, lname from customer where customer_num between 50 and 300; Select customer_num from customer where customer_num >= 50 and customer_num <= 300;
2009 IBM Corporation
17
B+ Tree Splits
18
R-Tree (1)
An R-tree index is a secondary data structure (or access method) that organizes access to data similar to a B-tree index. Used in Informix, for multi-dimensional and spatial data R-tree is specifically designed to index table columns that contain the following types of data: Multidimensional data:
Spatial data in two or three dimensions
Range values:
Such as a configuration for a house that includes the number of stories, the number of bedrooms, the number of baths, the age of the house, and the sqf
Such as the time of a television program (9:00 P.M. to 9:30 P.M.) (X, Y) coordinates of geographical data.
A common real-world usage for an R-tree might be: "Find all museums within 2 miles (3.2 km) of my current location".
19 2009 IBM Corporation
R-Tree (2)
Simple example of an R-tree for 2D rectangles
20
21
Raw device (disk partition), or Regular file system file (cooked file)
23
Root dbspace: stores systems catalog information and databases Blobspace: stores simple binary large objects Sbspace: stores smart binary large objects Temporary dbspaces: for non-logged temp data
2009 IBM Corporation
24
You can place different database objects in different disk spaces: To balance I/O work across the disk devices and controllers available Example: The database stores_demo in a data dbspace dbs1 The table customer in a data dbspace dbs2 The index ix_cust in a data dbspace dbs3
If no dbspace is specified, the database is created in rootdbs (the Root dbspace) If no dbspace is specified, the table is created in a new tblspace where its database resides If no dbspace is specified, the index is created in a new tblspace in the same dbspace as its table When an index is created on an empty table, just the root node of the B-tree is created
2009 IBM Corporation
25
26
By default, index extents are created in the dbspace that holds the data (table) extents
A detached index can have a fragmentation strategy different from the one used by its table, that you set up explicitly with CREATE INDEX
27
When creating an index on a column, if you omit or specify the ASC keyword, Informix stores the key values in ascending order Default order for a column From the smallest to the largest key Example on customers last name: Albertson, Beatty, Currie
create index ix_cust on customer(lname asc); or create index ix_cust on customer(lname); Use DESC keyword for Informix to store the key values in descending order From the largest to the smallest key Example on customers last names: Currie, Beatty, Albertson create index ix_cust on customer(lname desc);
28 2009 IBM Corporation
29
Explicit indexes are created using CREATE INDEX statement It is recommended to explicitly create indexes that exactly match the referential constraint and then use ALTER TABLE to add the constraint
The constraint will use the existing index instead of implicitly creating one
CREATE TABLE tab1 (col1 INTEGER, col2 INTEGER, col3 CHAR (25)) IN table1dbs; CREATE INDEX index1 ON TABLE table_name(col1) IN idx1dbs FILLFACTOR 70; ALTER TABLE tab1 ADD CONSTRAINT PRIMARY KEY (col1);
30
Any index listing two or more columns is a composite index List the columns in the order from most frequently used to least frequently used Facilitates multiple column joins Increases uniqueness of indexed values Example:
CREATE INDEX ix_items ON items(manu_code, stock_num);
32
To join column a, columns ab, or columns abc to another table To implement ORDER BY or GROUP BY on columns a, ab or abc
but not on b, c, ac, or bc
33 2009 IBM Corporation
when all of the desired columns are contained within the index
34
You can recluster the table to regain performance by issuing another ALTER INDEX TO CLUSTER statement on the clustered index:
ALTER INDEX ix_cust TO CLUSTER;
TO NOT CLUSTER option drops the cluster attribute on the index name without affecting the physical table
35
36
37
38
39
Creating indexes
Examples:
CREATE UNIQUE INDEX ix_orders ON orders(orders_num) IN idx_dbs;
CREATE INDEX ix_items ON items(manu_code, stock_num); CREATE UNIQUE CLUSTER INDEX ix_manufact ON manufact(manu_code) FILLFACTOR 80; CREATE INDEX ix_man_stk ON items(manu_code desc, stock_num); CREATE INDEX order_ix1 ON orders (order_num, order_date desc);
40
41
42
Advantages of Fragmentation
Parallel scans and other parallel operations Several disk devices can be read/processed in parallel Balanced I/O Fragment discrimination: We only do I/O in the portion/fragment that contains the data we are asking for Higher availability If a dbspace containing a fragment is unavailable, we may still read the data in the other fragments that are up and available Scalability (ability to grow in size) There is a limit in the maximum amount of data per fragment/dbspace, by fragmenting the table/index, we are giving the chance to grow more
43
CREATE TABLE table1( col_1 SERIAL, col_2 CHAR(20)) FRAGMENT BY ROUND ROBIN IN dbspace1,dbspace2;
44
CREATE TABLE table1( customer_num SERIAL lname CHAR(20)...) FRAGMENT BY EXPRESSION MOD(customer_num, 3) = 0 IN dbspace1, MOD(customer_num, 3) = 1 IN dbspace2, MOD(customer_num, 3) = 2 IN dbspace3;
45
Fragmented/Partitioned Indexes
46
47
Fragmenting an Index
Discussion: Is this statement valid?
CREATE UNIQUE INDEX ia ON tabl(col1) FRAGMENT BY EXPRESSION col2 <= 10 IN dbsp1, col2 > 10 AND col2 <= 100 IN dbsp2, col2 > 100 IN dbsp3;
48
49
3 row(s) retrieved.
<- items table has 3 indexes defined on it, although we are showing just partial results
50
Steps to Indexing
Determine if the index will be attached, detached or fragmented If fragmented, determine the expression Determine the FILLFACTOR for each index Determine other characteristics of the index, like: Simple or composite, and default order of the columns Cluster or not clustered Unique or duplicate Calculate the space requirements for the index and temporary dbspace to build the index Determine the optimal locations of the index and temporary dbspaces (defined in DBSPACETEMP) on disk Set the PDQPRIORITY and PSORT_NPROCS environment variables To optimize the parallel index parallel builds Check and optimize configuration parameters if needed B-tree cleaners (threads that automatically clean up deleted keys) Buffers (to cache I/O of data and index pages) Temporary dbspaces (for sorts in index builds) Parallelism (to speed up index builds and other engine operations)
51 2009 IBM Corporation
For an attached index, the database server uses the ratio of the index key size to the row size to calculate an extent size for the index:
Index extent size = (index_key_size/table_row_size) * table_extent_size
For a detached index, the database server uses the ratio of the index key size (plus some overhead in bytes) to the rowsize to assign an appropriate extent size for the index:
Detached Index extent size = ((index_key_size + 13)/table_row_size) * table_extent_size
After created, run a query on the system catalog tables or run oncheck pe or oncheck -pT mydb:mytab to get the size of indexes
2009 IBM Corporation
52
Benefits of Indexing
Use filtering to reduce the number of pages read (I/O) Eliminate sorts implementations (as temporary tables) Ensure uniqueness of key values (constraints) Reduce the number of pages read (key-only reads)
53
Costs of Indexing
54
Costs (Overhead)
Space cost: Require disk space
There are ways to estimate the size an index will have
55
Troubleshooting
When an Index should be used but it is not:
59
References
IBM Informix Dynamic Server (IDS) 11.50 Information Center (Manuals online) http://publib.boulder.ibm.com/infocenter/idshelp/v115/index.jsp IBM IDS manuals available in PDF format http://www-01.ibm.com/software/data/informix/pubs/library/ All IBM Informix technical publications and link to Support site http://www-01.ibm.com/software/data/informix/pubs/ http://www-01.ibm.com/support/docview.wss?uid=swg27013894 Look for: Getting Started Guide, Design and Implementation Guide, SQL Tutorial, Performance Guide and Administrators Guide
Old but good article on fragmentation: Divide and conquer: Using fragmentation for smart data access http://www.ibm.com/developerworks/db2/zones/informix/library/techarticle/0 206metzger/0206metzger1.html
60