You are on page 1of 10

Introduction Indexing is one of the most important features of the Teradata RDBMS.

In the Ter adata RDBMS, an index is used to define row uniqueness and retrieve data rows, i t also can be used to enforce the primary key and unique constraints for a table . The Teradata RDBMS support five types of indexes: * * * * * Unique Primary Index (UPI) Unique Secondary Index (USI) Non-Unique Primary Index (NUPI) Non-Unique Secondary Index (NUPI) Join Index

The typical index contains two fields: a value and a pointer to instances of tha t value in a data table. Because the Teradata RDBMS uses hashing to distribute r ows across the AMPs, the value is condensed into an entity called a row hash, wh ich is used as the pointer. The row hash is not the value, but a mathematically transformed address. The Teradata RDBMS uses this transformed address as a retri eval index. The following rules apply to the indexes used in the Teradata Relation database: * An index is a scheme used to distribute and retrieve rows of a data table. It can be based on the values in one or more columns of the table. * A table can have a number of indexes, including one primary index, and up to 32 secondary indexes. * An index for a relational table may be primary or secondary, and may be un ique or non-unique. Each kind of index affects system performance, and can be im portant to data integrity. * An index is usually defined on a table column whose values are frequently used in specifying WHERE constraints or join conditions. * An index is used to enforce PRIMARY KEY and UNIQUE constraints. CREATE TABLE statement allows UNIQUE and PRIMARY Keys as defined constraints on a table, and each index may be given a name, which will allow the Teradata SQL s tatements refer to it. Primary Index Primary index determines the distribution of table rows on the disks controlled by AMPs. In Teradata RDBMS, a primary index is required for row distribution and storage. When a new row is inserted, its hash code is derived by applying a has hing algorithm to the value in the column(s) of the primary code (as show in the following figure). Rows having the same primary index value are stored on the s ame AMP.

Rules for Defining primary indexes The primary indexes for a table should represent the data values most used by th

e SQL to access the data for the table. Careful selection of the primary index i s one of the most important steps in creating a table. Defining primary indexes should follow the following rules: * A primary index should be defined to provide a nearly uniform distribution of rows among the AMPs, the more unique the index, the more even the distributi on of rows and the better space utilization. * The index should be defined on as few columns as possible. * Primary index can be either Unique or non-unique. A unique index must have a unique value in the corresponding fields of every row; a non-unique index pe rmits the insertion of duplicate field values. The unique primary index is more efficient. * Once created, the primary index cannot be dropped or modified, the index m ust be changed by recreating the table. If a primary index is not defined in the CREATE TABLE statement through an expli cit declaration of a PRIMARY INDEX, the default is to use one of the following: * PRIMARY key * First UNIQUE constraint * First column The primary index values are stored in an integral part of the primary table. It should be based on the set selection most frequently used to access rows from a table and on the uniqueness of the value. Creating primary index Unique primary index for a table is created using the (UNIQUE) PRIMARY INDEX cla use of the CREATE TABLE statement. Non-unique primary indexes are creating in th e same way, but omit the keyword UNIQUE. If an index is defined on more than one column, all index columns must be specified in the WHERE clause of a request in order for a row or rows to be directly accessed. Once created, the primary inde x cannot be dropped or modified, the index must be changed by recreating the tab le. Examples of Creating Primary Index in Teradata RDBMS Example of Creating Primary Index in Teradata RDBMS Unique primary index for a table is created using the (UNIQUE) PRIMARY INDEX cla use of the CREATE TABLE statement. BTEQ -- Enter your DBC/SQL request or BTEQ command: create table organic (serial_No integer, organic_name char(15), Carbon_number sm allint, amount smallint) unique primary index (serial_No); *** Table has been created. *** Total elapsed time was 1 second. BTEQ -- Enter your DBC/SQL request or BTEQ command: create table inorganic (serial_No integer, inorgnic_name char(15), anion char(5) , cation char(6), amount smallint) unique primary index (serial_No); *** Table has been created. *** Total elapsed time was 1 second. BTEQ -- Enter your DBC/SQL request or BTEQ command: create table order_log (log_No smallint, student_name char(15), order_date char( 10), checkin_date char(10)) unique primary index (log_No);

*** Table has been created. *** Total elapsed time was 1 second. Non-unique primary index for a table is created using the PRIMARY INDEX clause o f the CREATE TABLE statement. BTEQ -- Enter your DBC/SQL request or BTEQ command: create table student (student_ID char(15), student_name char(20), Department cha r(10)) primary index (student_ID); *** Table has been created. *** Total elapsed time was 1 second. Once created, the primary index cannot be dropped or modified, the index must be changed by recreating the table. BTEQ -- Enter your DBC/SQL request or BTEQ command: drop index (student_ID) on student; *** Failure 3525 The user cannot create or drop a PRIMARY index. Statement# 1, Info =0 *** Total elapsed time was 1 second.

Access data using primary index The primary index should be based only on an equality search. When a query conta ins WHERE constraint, which has the unique primary index value(s), the request i s processed by hashing the value to locate the AMP where the row is stored, and then retrieve the row that contains a matching value in the hash code portion of its rowID. For example, since employee_number is the unique primary index for t he Customer_Service.employee table, assume that an employee_number value is used as an equality constraint in a request as follows: SELECT employee_number FROM customer_service.employee WHERE employee_number = 1024; This request is process by hashing 1024 to do the following: * Locate the AMP where the row is stored. * Retrieve the row that contains a matching value in the hash code portion o f its rowID. The Teradata RDBMS processes data most efficiently if table rows are uniformly d istributed (hashed) across the AMPs on which they are stored. Primary index versus primary key The column(s) chosen to be the primary index for a table are frequently the same as the primary key during the data modeling process, but there are conceptual d ifferences between these two terms: Term Primary Key

Primary Index Definition A relational concept used to determine relationships among entit ies and to define referential constraints Used to store rows on disk Requirement Not required, unless referential integrity checks are to be perf ormed Required Defining Define by CREATE TABLE statement Defined by CREATE TABLE statement Uniqueness Unique Unique or non-unique Function Identifies a row uniquely Distributes rows Values can be changed? No Yes Can be null? No Yes Related to access path? No Yes

Secondary Index In addition to a primary index, up to 32 unique and non-unique secondary indexes can be defined for a table. Comparing to primary indexes, Secondary indexes all ow access to information in a table by alternate, less frequently used paths. A secondary index is a subtable that is stored in all AMPs, but separately from th e primary table. The subtables, which are built and maintained by the system, co ntain the following; * RowIDs of the subtable rows * Base table index column values * RowIDs of the base table rows (points) As shown in the following figure, the secondary index subtable on each AMP is as sociated with the base table by the rowID.

Defining and creating secondary index Secondary index are optional. Unlike the primary index, a secondary index can be added or dropped without recreating the table. There can be one or more seconda ry indexes in the CREATE TABLE statement, or add them to an existing table using the CREATE INDEX statement or ALTER TABLE statement. DROP INDEX can be used to dropping a named or unnamed secondary index. Since secondary indexes require sub tables, these subtables require additional disk space and, therefore, may requir e additional I/Os for INSERTs, DELETEs, and UPDATEs. Generally, secondary index are defined on column values frequently used in WHERE constraints. Examples of creating and updating Secondary Index in Teradata RDBMS Example of creating and updating Secondary Index in Teradata RDBMS Secondary index for a table is created using the (UNIQUE) INDEX clause of the CR EATE TABLE statement. BTEQ -- Enter your DBC/SQL request or BTEQ command: create table organic (serial_No integer, organic_name char(15), Carbon_number sm allint, amount smallint) unique primary index (serial_No), unique index (organic _name);

*** Table has been created. *** Total elapsed time was 1 second. BTEQ -- Enter your DBC/SQL request or BTEQ command: create table order_log (log_No smallint, student_name char(15), order_date char( 10), checkin_date char(10)) unique primary index (log_No), index (student_name); *** Table has been created. *** Total elapsed time was 1 second. There can be one or more secondary indexes in the CREATE TABLE statement. BTEQ -- Enter your DBC/SQL request or BTEQ command: create table inorganic (serial_No integer, inorgnic_name char(15), anion char(5) , cation char(6), amount smallint) unique primary index (serial_No), index (anio n), index (cation); *** Table has been created. *** Total elapsed time was 1 second. Secondary indexes can be added to an existing table using the CREATE INDEX state ment. BTEQ -- Enter your DBC/SQL request or BTEQ command: create index index1 (order_date) on order_log; *** Index has been created. *** Total elapsed time was 1 second. DROP INDEX can be used to dropping a named or unnamed secondary index. BTEQ -- Enter your DBC/SQL request or BTEQ command: drop index index1 on order_log; *** Index has been dropped. *** Total elapsed time was 1 second. BTEQ -- Enter your DBC/SQL request or BTEQ command: drop index (anion), index (cation) on inorganic; *** Index has been dropped. *** Total elapsed time was 1 second. Access data using secondary index If a Teradata SQL request uses secondary index values in a WHERE constraint, the optimizer may use the rowID in a secondary index subtable to access the qualify ing rows in the data table. If a secondary index is used only periodically by ce rtain applications and is not routinely used by most applications, disk space ca n be saved by creating the index when it is needed and dropping it immediately a fter use. A unique secondary index is very efficient, it typically allows access of only t wo AMPs, requires no spool file, and has one row per value, therefore, when a un ique secondary index is used to access a row, two AMPs are involved. Unique seco ndary indexes can thus improve performance by avoiding the overhead of scanning all AMPs. For example, if a unique secondary index is defined on the department_ name column of the Customer_service.department table (assuming that no two depar

tments have the same name), then the following query is processed using two AMPs : SELECT department_number FROM customer_service.department WHERE department_name = 'Education'; In this example, the request is sent to AMP n, which contains the rowID for the secondary index value "Education", this AMP, in turn, sends the request to AMP m , where the data row containing that value is stored. Note that the rowID and th e data row may reside on the same AMP, in which case only one AMP is involved. A non-unique secondary index (NUSI) may have multiple rows per value. As a gener al rule, the NUSI should not be defined if the maximum number of rows per value exceeds the number of data blocks in the table. A NUSI is efficient only if the number of rows accessed is a small percentage of the total number of data rows in the table. It can be useful for complex conditional expressions or processing aggregates. For example, if the contact_name column is defined as a secondary i ndex for the customer_service.contact table, the following statement can be proc essed by secondary index: SELECT * FROM customer_service.contact WHERE contact_name = 'Mike'; After request is submitted, the optimizer first will determine if it is faster t o do a full-table scan of the base table rows or a full-table scan of the second ary index subtable to get the rowIDs of the qualifying base table rows; then pla ce those rowIDs into a spool file; finally use the resulting rowIDs to access th e base table rows. Non-unique secondary indexed accessed is used only for request processing when i t is less costly than a complete table search. Join Index A join index is an indexing structure containing columns from multiple tables, s pecifically the resulting columns form one or more tables. Rather than having to join individual tables each time the join operation is needed, the query can be resolved via a join index and, in most cases, dramatically improve performance. Effects of Join index Depending on the complexity of the joins, the Join Index helps improve the perfo rmance of certain types of work. The following need to be considered when manipu lating join indexes: * Load Utilities The join indexes are not supported by MultiLoad and Fast Load utilities, they must be dropped and recreated after the table has been loa ded. * Archive and Restore Archive and Restore cannot be used on join index it self. During a restore of a base table or database, the join index is marked as invalid. The join index must be dropped and recreated before it can be used aga in in the execution of queries. * Fallback Protection Join index subtables cannot be Fallback-protected. * Permanent Journal Recovery The join index is not automatically rebuilt during the recovery process. Instead, the join index is marked as invalid and th e join index must be dropped and recreated before it can be used again in the ex ecution of queries.

* Triggers A join index cannot be defined on a table with triggers. * Collecting Statistics In general, there is no benefit in collecting sta tistics on a join index for joining columns specified in the join index definiti on itself. Statistics related to these columns should be collected on the underl ying base table rather than on the join index. Defining and creating secondary index Join indexes can be created and dropped by using CREATE JOIN INDEX and DROP JOIN INDEX statements. Join indexes are automatically maintained by the system when updates (UPDATE, DELETE, and INSERT) are performed on the underlying base tables . Additional steps are included in the execution plan to regenerate the affected portion of the stored join result. Examples of creating and updating Join Index in Teradata RDBMS Example of creating and updating Join Index in Teradata RDBMS Join indexes can be created by using CREATE JOIN INDEX statements on Multitable. BTEQ -- Enter your DBC/SQL request or BTEQ command: create join index all_chemical as select (organic_name, carbon_number), (inorgnic_name, cation, anion) from organic inner join inorganic on organic_name = inorgnic_name; *** Index has been created. *** Total elapsed time was 2 seconds. Join indexes can be created by using CREATE JOIN INDEX statements on Single tabl e. BTEQ -- Enter your DBC/SQL request or BTEQ command: create join index chem_name as select organic_name from organic; *** Index has been created. *** Total elapsed time was 2 seconds. BTEQ -- Enter your DBC/SQL request or BTEQ command: create join index ion as select anion, cation from inorganic; *** Index has been created. *** Total elapsed time was 1 second. It is better to define a primary index when creating join index. BTEQ -- Enter your DBC/SQL request or BTEQ command: create join index all_chemicals as select (organic_name, carbon_number), (inorgnic_name, cation, anion) from organic inner join inorganic on organic_name = inorgnic_name primary index (organic_name); *** Index has been created. *** Total elapsed time was 2 seconds. Secondary index can be defined on top of a join index. BTEQ -- Enter your DBC/SQL request or BTEQ command: create index inorg all (anion, cation) on all_chemicals; *** Index has been created. *** Total elapsed time was 2 seconds.

Join indexes can be dropped by using DROP JOIN INDEX statements. BTEQ -- Enter your DBC/SQL request or BTEQ command: drop join index all_chemical; *** Index has been dropped. *** Total elapsed time was 2 seconds. BTEQ -- Enter your DBC/SQL request or BTEQ command: drop join index chem_name; *** Index has been dropped. *** Total elapsed time was 2 seconds.

Access data using Join index Join index is useful for queries where the index structure contains all of the c olumns referenced by one or more joins in a query. Join index was developed so t hat frequently executed join queries could be processed more efficiently. like the other indexes, a join index store rowID pointers to the associated base table rows. Another use of the join index is to define it on a single table. this will impr ove the performance of single table scans that can be resolved without accessing the base table. Using Index to Process SQL Statement or Access Data Each type of index has a specific effect on system performance. Row selection is more efficient using a unique index. When a SELECT statement uses a unique inde x in a WHERE clause, no spool file needs be created for intermediate storage of the result because only on row is expected. An index that is not unique allows m ore than one row to have the same index value. Therefore, row selection using a non-unique index may require a spool file to hold intermediate rows for final pr ocessing. The Teradata Relational database systems do not permit explicit use of indexes i n SQL queries. When a request is entered, the optimizer examines the following a vailable information about the table to determine whether an index is used durin g processing: * * * * Number of rows in the table Statistics collected for the table Number and types of indexes defined for the table UPDATES, DELETES, and PRIMARY KEY and UNIQUE constraints

The optimizer will decides which index or indexes to use to optimize the queries , it selects whichever index or indexes will return the query result most quickl y. Weather an index is used to process a Teradata SQL statement or to access dat a in a table depends on the following factors * How the statement is structured. * Whether current statistics exist for the table. * Whether PRIMARY KEY or UNIQUE constraints need to be validated.

Examples of using Index to process SQL statement and access data Example of using Index to process SQL statement and access data Process SQL statement and access data using primary index (using EXPLAIN to disp lay how data is accessed). *The rxw05.order_log table have unique primary index: log_No, secondary index: s tudent_name. BTEQ -- Enter your DBC/SQL request or BTEQ command: select * from rxw05.order_log where log_No = 15; *** Query completed. 1 rows found. 4 columns returned. *** Total elapsed time was 1 second. log_No -----15 11 student_name --------------Jone Smith Adam order_date checkin_date --------------------12/02/99 12/16/99 11/01/99 11/15/99

BTEQ -- Enter your DBC/SQL request or BTEQ command: explain select * from rxw05.order_log where log_No = 15; *** Help information returned. 6 rows. *** Total elapsed time was 1 second. Explanation --------------------------------------------------------------------------1) First, we do a single-AMP RETRIEVE step from rxw05.order_log by way of the unique primary index "rxw05.order_log.log_No = 15" with no residual conditions. The estimated time for this step is 0.03 seconds. -> The row is sent directly back to the user as the result of statement 1. The total estimated time is 0.03 seconds. Process SQL statement and access data using secondary index (using EXPLAIN to di splay how data is accessed). *The rxw05.organic table have unique primary index: serial_No, secondary index: organic_name. BTEQ -- Enter your DBC/SQL request or BTEQ command: select * from rxw05.organic where organic_name = 'methanol'; *** Query completed. 1 rows found. 4 columns returned. *** Total elapsed time was 1 second. serial_No organic_name Carbon_number amount ----------- -------------------------------1 methanol 1 BTEQ -- Enter your DBC/SQL request or BTEQ command: explain select * from rxw05.organic where organic_name = 'methanol'; *** Help information returned. 6 rows. *** Total elapsed time was 1 second. Explanation

500

--------------------------------------------------------------------------1) First, we do a two-AMP RETRIEVE step from rxw05.organic by way of unique index # 4 "rxw05.organic.organic_name = 'methanol'" with no residual conditions. The estimated time for this step is 0.07 seconds. -> The row is sent directly back to the user as the result of statement 1. The total estimated time is 0.07 seconds. Process SQL statement and access data by full base table scan (using EXPLAIN to display how data is accessed). *The rxw05.order_log table have unique primary index: log_No, secondary index: s tudent_name. BTEQ -- Enter your DBC/SQL request or BTEQ command: select * from rxw05.order_log where log_no>10; *** Query completed. 2 rows found. 4 columns returned. *** Total elapsed time was 1 second. log_No -----15 11 student_name --------------Jone Smith Adam order_date checkin_date --------------------12/02/99 12/16/99 11/01/99 11/15/99

BTEQ -- Enter your DBC/SQL request or BTEQ command: explain select * from rxw05.order_log where log_no>10; *** Help information returned. 12 rows. *** Total elapsed time was 1 second. Explanation --------------------------------------------------------------------------1) First, we lock a distinct rxw05."pseudo table" for read on a RowHash to prevent global deadlock for rxw05.order_log. 2) Next, we lock rxw05.order_log for read. 3) We do an all-AMPs RETRIEVE step from rxw05.order_log by way of an all-rows scan with a condition of ("rxw05.order_log.log_No > 10") into Spool 1, which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 27 rows. The estimated time for this step is 0.15 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.15 seconds.