Indexes, Too Much of a Good Thing?

Some of you may be familiar with relational databases other than Teradata and how those other RDBMs utilize indexes. In Teradata, an ³Index´ is a physical mechanism that is used to distribute, store, and access data rows. Indexes provide a physical access path to the data and their use can avoid unnecessary fulltable scans to locate rows. To level set, let¶s first consider the difference between a Key and an Index. A ³Key´ is a relational term. The Primary Key of a table is a column set that uniquely identifies a row in a logical table. A key is an identifier and not a physical mechanism. In Teradata, there are four main types of indexes: Primary Indexes, Secondary Indexes, Join Indexes, and Hash Indexes. Each of these types of indexes have their own distinctive flavors and uses. All Teradata Database tables require a Primary Index because the system distributes table rows to the AMPs based on primary index values. (* Teradata 13.0 has a NoPI feature for loading staging tables.) To accommodate Teradata¶s massively parallel architecture, indexes use a hashing algorithm based on data row hash values as the most efficient means of distributing and retrieving data. Primary indexes can be either unique or non-unique and partitioned or non-partitioned:
y y y y

Unique primary index (UPI) Non-unique primary index (NUPI) Non-partitioned primary index (NPPI) Partitioned primary index (PPI)

Partitioned primary indexes may have a single partitioning expression or multiple partitioning expressions:
y y

Single-level PPI (SLPPI) Multilevel PPI (MLPPI)

The main purposes for Teradata¶s Primary Indexes:

Provide access to data rows, obviating the need to do full-table scans. By providing the values for all the primary index columns in your SQL WHERE clause, direct access to the AMP with the row(s) can be made using that primary index value. (Note: Teradata

say. it is not the optimum solution. What if. The compromise that¶s often made is to break the large table into a number of range partitioned tables which work better for tactical queries than scanning one big table. and although it improves performance for tactical queries. For example. to push the envelope for Teradata users who want to see better and better query performance whether it¶s for tactical or strategic requests. It¶s an optional extension to Teradata¶s hashed primary indexing that adapts it to more efficiently handle range queries. and to UNION the partitioned tables together for strategic queries that need all the data. But breaking up what should be one table into multiple tables makes them harder to manage and maintain. between strategic queries that need to scan a multi-year history of sales. The uniqueness of the data values in the primary index will affect how evenly the data distribution is across the AMPs. very positive impact on performance. rows of the table may or may not have to be redistributed. and this is what PPI has been designed to address. Partitioned Primary Index (PPI) is a table organization scheme that very elegantly enhances the existing Teradata structures and helps remove this trade-off dilemma.or perhaps even less? This would have a huge. Efficiency of join processing. PPI has also been designed to be very easily set up and managed and to put a minimum additional burden on the DBA in keeping with our philosophy of low cost database management. spooled. Up to 64 columns can be specified for a primary index definition. y y Unique secondary index (USI) Nonunique secondary index (NUSI) Join Indexes offer a variety of six subtypes that include: y y Single-table join index (STJI) Single-table aggregate join index (STAJI) . you must provide values for all the columns defined in an index. Depending upon the choice of index. instead of having to fully scan a large table to satisfy a query we only had to scan 50% of it? What if we only had to scan 10%«. the last 60 days.y y does not do µpartial¶ indexing. Secondary indexes may be added or dropped as needed with the caveat that building them requires some amount of system resources and you¶ll want to check with your DBA appropriate. A trade-off that must often be considered is between the needs of full table scan queries and queries that can be range constrained in some way. Rows are distributed across the AMPs based on the hash of their Primary Index. Secondary Indexes provide alternate access paths to the data and may be Hash-ordered or Value-ordered. and tactical queries that only need transactions from. and sorted prior to the join.) Determine which AMP a row will distribute to.

multi-table or single-table. can be sparse. deleted. Teradata does not do partial index retrievals. A caution when choosing Join Indexes is that neither MultiLoad nor FastLoad utilities support tables with join indexes. the concept of index covering (the query can be satisfied by columns in the index without accessing the base table) applies to both STJI and NUSI. it can use either STJIs or NUSIs to do Full Table Scans of their subtables instead of a FTS of the base table. Advantages of Indexes Indexes are a retrieval mechanism and the intent of indexes is to lessen the time it takes to retrieve rows from a database and eliminate full table scans. Criteria for choosing Join Indexes vs NUSIs The similarities between STJI and NUSI are that STJIs can be defined with the same columns as NUSI. Any Join Index. join or hash indexes you have. and NUSIs are supported by MultiLoad. Some of the indexes are ³allergic´ to some Load Utilities so you¶ll want to check on what may . a possible workaround could be to FastLoad data into an empty table and then either MERGE or do a INSERT/SELECT into the table with the Join Index. the more maintenance that will have to be performed. however. Hash Indexes are a similar to Join Indexes and have a narrower usage. hash. and join indexes are stored in subtables and will require extra storage space. The more secondary. which will allow the Optimizer to use the index to satisfy the query rather than the underlying base table. Hash Indexes are limited to one table and the table¶s Primary Index cannot be partitioned. Whenever a base table row is updated. Index subtables must be also be updated. value ordering is available on both STJI and NUSI. Like the Join Index. but not STJIs. The basic differences between STJI and NUSI are that a STJI is similar to a table with a primary index with additional columns defined. or inserted. If Join Indexes are needed. Hash indexes are useful for queries where the index contains the columns referenced by a query. Hash indexes can also be defined on a table in place of traditional secondary indexes. whether simple or aggregate.y y y y Single-table sparse join index Multitable simple join index Multitable aggregate join index Multitable sparse join index Join Indexes are highly recommended for decision support applications because they often provide superior performance to large table joins and aggregate computations. and joins to either STJIs or NUSIs are possible. All Teradata secondary. The create statement of the index uses a constant expression in the WHERE clause of its definition to narrowly filter its row population. Disadvantages of Indexes This is where too much of a good thing can be a disadvantage. a STJI row can be stored on the same AMP or a different AMP as table data row whereas NUSIs are stored on same AMP as table data row. FastLoad and MulitLoad do not support Hash Indexes. impacted as a result of index . CHOOSE WISELY! http://developer. One of the most important tasks of a DBA is to choose Indexes. A good practice is to use the EXPLAIN before you execute a query will help you determine which indexes are being used for your query.

Sign up to vote on this title
UsefulNot useful