You are on page 1of 22

DBA BEST PRACTICES DB2 UDB LUW SQL TUNING

FEBRUARY 2010

TABLE OF CONTENTS
1.0 2.0 3.0 4.0 5.0 Overview Introduction UDB DB2 Database Manager Background Assumptions Best Practices 5.1 Best Practices for Database Configuration 5.1.1 5.1.2 5.1.3 5.1.4 5.2 5.3 5.4 5.5 5.2.1 5.3.1 5.4.1 5.5.1 5.5.2 5.5.3 5.5.4 5.5.5 5.5.6 5.5.7 Database Optimization Class Registry Setting Database Manager Instance Configuration File Parameters Database Configuration File Parameters Database Bufferpool and Tablespace Configuration Database Table and Index Design RUNSTATS Command REORGANIZE and REORGCHK Commands Prioritize then Divide and Conquer Get Baseline Run Times and EXPLAIN Plans Best Practice Coding Techniques Review Joins and Indexes Review All Selected Columns and Table Indexes Retest the Entire Work Load After SQL Performance Tuning DB2 Index Advisor 4 4 5 7 7 7 7 8 9 10 11 11 12 13 14 14 15 15 15 15 17 17 17 18 18 19 19 19 DB2expln Facility 20

Database Table and Index Best Practices UDB DB2 Database RUNSTATS UDB DB2 Database Table Reorganization SQL Workload Tuning Best Practices

db2advis - DB2 design advisor command 5.6 Explain Tools 5.6.1 Visual Explain 5.6.2 Visual Explain Tool

2010 Computer Sciences Corporation.

SQL and XQuery explain tool 6.0 Appendix

20 21

BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

1.0 Overview
The intent of this document is to describe the best practices for SQL Tuning for DB2 Databases in the LUW environments. The document covers: Database Maintenance for Best Practices Database Configuration for Best Performance Database Design Issues for Best Performance SQL Coding for Best Practices SQL Explain tools for Tuning for Performance
Revision Date 02/02/2010 Revised By Bruce Woodcraft Revision Summary Initial draft

Version 1

2.0 Introduction
This document describes best practices for writing Structured Query Language (SQL) scripts which retrieve data from an IBM DB2 database running on a Linux, UNIX, or Windows (LUW) server. It covers the best practices for writing SQL, reviewing database maintenance that affects data retrieval, database configuration parameters that impact performance, database object design issues for tables and indexes, and using the explain tools to assist in performance tuning activities. SQL Query Tuning Factors can be broken down into several categories: Database Configuration Database Object Maintenance Database Object Design (Tables and Indexes) SQL Coding Techniques DB2 Explain Plan Tools

There are many factors that determine the performance of a given SQL query, and many of which are beyond the control of the SQL query developer. For instance, there are database configuration parameter settings and table maintenance activities that the DBA controls, but; the SQL developer most likely does not have access to change or modify. It has been widely documented in the database tuning annals that the SQL query script is the single largest performance factor in more than three out of four cases. For this reason this document will have the greatest focus on SQL coding techniques for performance. The other contributing factors will be discussed but in far less detail as their remedies are detailed in other documents and are beyond the scope of this document.

2008 Computer Sciences Corporation.

BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

3.0 UDB DB2 Database Manager Background


Before discussing these SQL tuning factors, we first should consider some background on IBMs Universal DB2 Database Manager for LUW environments. The most import component of the product relevant to running queries to retrieve data is the Optimizer. The optimizer for any Relational Database Management System (RDBMs) provides the intelligence for determining the best steps for accessing and retrieving the data needed to satisfy the query. This set of database tasks is known as the Optimized Access Path. Thus the Optimizer determines how queries will be performed within the database and is the distinguishing component among RDBMs. Below is a brief description of DB2s Optimizer from an IBM Technical article titled Coding DB2 SQL for Perforance: The Basics.
http://www.ibm.com/developerworks/data/library/techarticle/0210mullins/0210mullins.html#author

The Optimizer
The optimizer is the heart and soul of DB2. It analyzes SQL statements and determines the most efficient access path available for satisfying each statement (see Figure 1). DB2 UDB accomplishes this by parsing the SQL statement to determine which tables and columns must be accessed. The DB2 optimizer then queries system information and statistics stored in the DB2 system catalog to determine the best method of accomplishing the tasks necessary to satisfy the SQL request.

Figure 1. DB2 optimization in action.

2008 Computer Sciences Corporation.

BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

The optimizer is equivalent in function to an expert system. An expert system is a set of standard rules that, when combined with situational data, returns an "expert" opinion. For example, a medical expert system takes the set of rules determining which medication is useful for which illness, combines it with data describing the symptoms of ailments, and applies that knowledge base to a list of input symptoms. The DB2 optimizer renders expert opinions on data retrieval methods based on the situational data housed in DB2's system catalog and a query input in SQL format. The notion of optimizing data access in the DBMS is one of the most powerful capabilities of DB2. Remember, you access DB2 data by telling DB2 what to retrieve, not how to retrieve it. Regardless of how the data is physically stored and manipulated, DB2 and SQL can still access that data. This separation of access criteria from physical storage characteristics is called physical data independence. DB2's optimizer is the component that accomplishes this physical data independence. If you remove the indexes, DB2 can still access the data (although less efficiently). If you add a column to the table being accessed, DB2 can still manipulate the data without changing the program code. This is all possible because the physical access paths to DB2 data are not coded by programmers in application programs, but are generated by DB2. Compare this with non-DBMS systems in which the programmer must know the physical structure of the data. If there is an index, the programmer must write appropriate code to use the index. If someone removes the index, the program will not work unless the programmer makes changes. Not so with DB2 and SQL. All this flexibility is attributable to DB2's capability to optimize data manipulation requests automatically. The optimizer performs complex calculations based on a host of information. To visualize how the optimizer works, picture the optimizer as performing a four-step process: 1. 2. 3. 4. Receive and verify the syntax of the SQL statement. Analyze the environment and optimize the method of satisfying the SQL statement. Create machine-readable instructions to execute the optimized SQL. Execute the instructions or store them for future execution.

The second step of this process is the most intriguing. How does the optimizer decide how to execute the vast array of SQL statements that you can send its way? The optimizer has many types of strategies for optimizing SQL. How does it choose which of these strategies to use in the optimized access paths? IBM does not publish the actual, in-depth details of how the optimizer determines the best access path, but the optimizer is a cost-based optimizer. This means the optimizer will always attempt to formulate an access path for each query that reduces overall cost. To accomplish this, the DB2 optimizer applies query cost formulas that evaluate and weigh four factors for each potential access path: the CPU cost, the I/O cost, statistical information in the DB2 system catalog, and the actual SQL statement.

2008 Computer Sciences Corporation.

BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

4.0 Assumptions
This document assumes the target audience has some experience and knowledge of SQL query scripting with some relational database and points out specific best practices for using IBMs UDB DB2 Database product for Linux, UNIX, and Windows (LUW) environments. Also, the UDB DB2 instance and database parameter configure is beyond the discussion for this paper; but, are as they the briefly mention below that these settings have an important role in the overall optimization of performance.

5.0 Best Practices


5.1 Best Practices for Database Configuration
This section describes some UDB DB2 system and database configuration parameters that can be changed by a DBA which could have the greatest impact on SQL query performance. These are examples of Other System Information in the Optimizer figure 1 above. These parameters are mentioned here but are covered in more detail in the Best Practices for Database Design for UDB DB2. CAUTION Only the DBA should consider tuning of these settings as they will impact all database activity, so the upmost level of caution is needed 5.1.1 DATABASE OPTIMIZATION CLASS REGISTRY SETTING Changing the setting of the Optimization Class registry variable can provide some of the advantages of explicitly specifying optimization techniques, especially for the following cases: To manage very small databases or very simple dynamic queries To accommodate memory limitations at compile time on your database server To reduce the query compilation time, such as PREPARE A query optimization class is a set of query rewrite rules and optimization techniques for compiling queries. Per IBM s UDB Information Center for LUW on this subject: Most statements can be adequately optimized with a reasonable amount of resources by using optimization class 5, which is the default query optimization class. At a given optimization class, the query compilation time and resource consumption is primarily influenced by the complexity of the query, particularly the number of joins and subqueries. However, compilation time and resource usage are also affected by the amount of optimization performed. Query optimization classes 1, 2, 3, 5, and 7 are all suitable for general-purpose use. Consider class 0 only if you require further reductions in query compilation time and you know that the SQL statements are extremely simple. To set the query optimization for dynamic SQL, enter the following command in the command line processor: SET CURRENT QUERY OPTIMIZATION = n;
2008 Computer Sciences Corporation.

BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

Again, CAUTION should be used when changing this setting. More information and a complete discussion of this setting can be found in the IBM UDB Information Center for LUW. http://publib.boulder.ibm.com/infocenter/db2luw/v9r5/index.jsp 5.1.2 DATABASE MANAGER INSTANCE CONFIGURATION FILE PARAMETERS Each UDB DB2 Instance has an Instance Configuration file that contains 68 parameters. There are a few that have a significant impact on performance which are listed below.

Table source: IBM Redbook DB2 UDB Enterprise Edition V8.1: Basic Performance Tuning Guidelines http://www.redbooks.ibm.com/redpapers/pdfs/redp4251.pdf
2008 Computer Sciences Corporation.

BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

These parameters should be tuned by the database support DBA with CAUTION. For further detail on these parameters see the source document. 5.1.3 DATABASE CONFIGURATION FILE PARAMETERS Each UDB DB2 database has its own Database Configuration File which contains 82 different parameters. Below are the parameters that could have the greatest performance impact. Again use caution when changing any UDB DB2 parameter.

2008 Computer Sciences Corporation.

BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

Table source: IBM Redbook DB2 UDB Enterprise Edition V8.1: Basic Performance Tuning Guidelines http://www.redbooks.ibm.com/redpapers/pdfs/redp4251.pdf

Like the DB2 instance setting that can be turned, there are many DB2 Database configurations settings that can have a significant effect on performance of the database. Several key settings are: AVG_APPLS which the Optimizer uses to estimate how much buffer pool memory each which will get, CATALOGCACHE_SZ which determines how much memory is used to catalog the system catalog, and SORTHEAP which specifies amount of memory to be available for each sort operation. The details of tuning these parameters are discussed in detail in the IBM Redbook referenced above and under the UDB DB2 Database Tuning Best Practices and IBMs UDB DB2 Administration manual. 5.1.4 DATABASE BUFFERPOOL AND TABLESPACE CONFIGURATION In any database design and configuration, the size and allocation of the databases bufferpools and table spaces have the most impact factor for improving the databases performance. Buffer pools are used to cache data in memory for reading and writing to disk, and they handle the data much faster from memory than from disks. Generally, there just a few of different page sizes to handle the different table space page sizes. Special purpose buffer pools may be created for specific data and processing methods. Likewise there are many sizes of tablespaces and specific purpose tablespaces. For instance, Temporary Tablespaces are created and assigned to specific buffer pools. UDB DB2 has options for partitioning large tables into multiple tablespaces for data separation and faster I/O performance. Specific data that is used frequently can be set up in its own bufferpool and tablespace so it can stay in memory for fast access. In tuning queries you may come across often-used data that may be separated out and tuned in this fashion. Tablespace changes, and even to a lesser extent bufferpools changes, may be needed to optimize a given query workload and would be the responsibility of a DBA and not a developer. Remember, database configuration changes like the one mentioned above need to be made with CAUTION as they could be counterproductive to other queries in the workload, especially if one bufferpool is reduced to create another. Its for this reason workloads need to be tuned as a group and measured as a group after individually looking at the slow performers and the most often run queries. (Do not underestimate the improvement that can be made to the overall runtime of a work load for a small query that is run a million times.)

2008 Computer Sciences Corporation.

10

BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

5.2 Database Table and Index Best Practices


Tables organize and group the data that fills the database while indexes provide maps to specific data in the tables and speeds the I/O processing. The importance of good design and planning here will immediately impact the databases performance. 5.2.1 DATABASE TABLE AND INDEX DESIGN Two other key elements of an optimal performing database are the design and function of the tables and indexes that were designed for it. Too often tables are collections of fields and no thought for function and use have been put into their design. Indexes get added to provide the tables a key but the design ends there. Tables with too many columns may be should be split into two parts, one with the most used columns and one with the least used columns. Some tables that are constantly joined to another table may be joined for operational efficiency despite not being in forth normal form. Most detail on the benefits of good table design could be found in the UDB DB2 Database Design Best Practices. Note however that table design and structure play an important role in optimizing in the tuning of every table that reads from it or joins to it. UDB DB2 offers a variety of table structures to store and retrieve data for optimal performance. There are Range-Clustered Tables (RCT), MultiDimensional Clustering tables (MDC), and for even larger tables, Range Partitioned tables (RP) tables. These table structures have specific indexing methods that are very beneficial when used properly. Again see the UDB DBA Database Best Practices for more detail on these table structures and indexing methods. One of the biggest factors effecting query performance is what indexes are available for the optimizer to use. The primary role of indexes is to shorten the path of the access plan so that the data may be retrieved as fast as possible. Indexes perform an awesome and powerful service for the database. Sometimes creating too many indexes or adding too many columns to a particular index will be detrimental to the entire work load, especially when adding or updating records to that over-indexed table. Adding indexes to a table is always a tradeoff between retrieval time and maintenance time plus storage space. Usually the retrieval time is more important and the indexing is done during a batch cycle when no one is waiting on it to finish. Also, UDB DB2 v9.7 has new index compression features that make indexes smaller and faster to use thus offsetting of the cost associated with an index on a larger table.
Rule to Remember:

Five to seven indexes per table with five to nine columns at most.. Most if not all tables will have an index of some kind. Generally most have a unique index that servers as the Primary Key and is explicitly states as the Primary Key. (Note in UDB DB2 it can be created as a CONSTRAINT and will have an index created for it.)

2008 Computer Sciences Corporation.

11

BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

Rule to Remember:

Use the Primary Key on a table whenever possible, unless another index provides more columns and faster Access Path.

Unique Indexes can be created on tables that are other than the Primary Key ( PK) and are referred to as Alternate Keys. For example, a sequence number (or identity column) may be added to the row to provide a sequential numeric column to use as the PK and a group of other columns may form the natural key and can be a unique combination of columns. Unique Indexes may Include other none indexed columns that provide a direct data source for a few table columns. This becomes an extremely effective tool especially for large rows with lots of columns. Adding a few extra columns to the Unique Index (or AK) permits the I/O to be limited to the index only, saving big row reads. This technique of I/O is known as Index Only Reads and is quite efficient compared to reading both the index and the data rows. In a Snowflake or a Hub and Spoke data model, where there are a few Fact tables that are linked to numerous Attribute tables, the Fact table should have single column attribute key indexes that match the indexes of the Attribute tables. UDB DB2 has a special join operator called the STAR JOIN which handles this type of joins and index processing in a highly efficient way using RID processing and index ANDing. See the IBM UDB Information Center for complete details of the STAR JOIN.

5.3 UDB DB2 Database RUNSTATS


As we seen in the Optimizer Diagram above, the UDB DB2 Database uses system catalog statistical data to assist the optimizer in determining the best steps to retrieval the needed data. Below will discuss the importance of this data and the necessity for keeping it up to date.

2008 Computer Sciences Corporation.

12

BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

5.3.1 RUNSTATS COMMAND . The UDB DB2 Database uses catalog statistics and column distribution counts to assist the optimizer determine the optimal data access path. Because the optimizer uses these counts to estimate the costs of various steps, these statistics become critical to the decision making process. The RUNSTATS command is used to generate fresh row counts and column distributions after a table has been modified in a significant way since the last time the RUNSTATS command was run.
Rule to Remember:

Run RUNSTATS command after significant changes or a total refresh of a table.

2008 Computer Sciences Corporation.

13

BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

5.4 UDB DB2 Database Table Reorganization


Another important UDB DB2 Database maintenance command is the REORGANIZE command which rearranges the rows in a table or index while removing the deleted rows. 5.4.1 REORGANIZE AND REORGCHK COMMANDS UDB DB2 Enterprise Manager use the REORGCHK command to test tables to see if they need to have the REORGANIZE command run on them.

Rule to Remember:

Run REORG command after significant deletions and additions to a table or index. The REORGCHK command calculates statistics on the database to determine if tables or indexes, or both, need to be reorganized or cleaned up.

Rule to Remember:

Run REORGCHK command to check to see if a table or index needs to be cleaned up.

2008 Computer Sciences Corporation.

14

BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

5.5 SQL Workload Tuning Best Practices


5.5.1 PRIORITIZE THEN DIVIDE AND CONQUER In most database environments there is a large set of SQL statements that is run against the database in any given time window. Some statements are repeated daily from on-line applications or report programs, others are ad hoc queries run one time by a single user. After capturing the complete set of statements, subdivide the statements by application and user priority. Also reduce the ad hoc queries to a representative subset as it will be impossible to optimize the database for every query, let alone ad hoc queries that may only be run once. Also identify queries that are run the most often as optimizing these queries will return big savings over time. Batch report queries need to run efficiently but may not be prioritize as high as on-line screen queries needing sub second response time. Review and tune the queries based on their priority and use. Focus on the most import queries and those with the most visibility. 5.5.2 GET BASELINE RUN TIMES AND EXPLAIN PLANS Once you have determined you Query Workload to tune, get baseline run times and Explain Plans. These will be needed for comparison to measure performance improvement during and at the end of tuning process. 5.5.3 BEST PRACTICE CODING TECHNIQUES There are some basic SQL coding techniques to follow to insure the best performance from the SQL script. SQL should be written to return the exact data needed with the minimal steps and amount of data processed. Queries need to use column and row filtering to quickly reduce the possible rows in the return record set. The use of indexed columns, simple predicates, and avoiding bad coding techniques will help the optimizer determine the best data access path for the query. Below are a few guidelines to keep in mind when coding and reviewing SQL scripts for optimal performance.
Keep WHERE Expressions Simple - When it comes to WHERE conditions, the simpler the better. Try to avoid using complex expressions where the expressions prevent the optimizer from using the catalog statistics to estimate an accurate selectivity. The expressions might also limit the choices of access plans that can be used to apply the predicate. Avoid Functions in JOINS - JOINS will be limited to slower Nested Joins when one of the join predicates contains an expression or function. Also the expressions may cause the cardinality estimates to be inaccurate and cause the optimizer to select a non-optimal path. Avoid Expressions on JOIN Columns - Try to avoid using expressions on JOIN columns where an index exists that would disqualify the use of the index. If possible try to rewrite the query using indexed columns or try using the reverse operations of the expressions . Applying expressions over columns prevents the use of index start and stop keys, leads to inaccurate
15

2008 Computer Sciences Corporation.

BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

selectivity estimates, and requires extra processing at query execution time. These expressions also prevent or hamper query rewrite optimization steps as well. Match JOIN Column Types - Avoid mismatched JOIN values as data type mismatches prevent the use of hash joins. Also note that if the JOIN column data type is CHAR, GRAPHIC, DECIMAL or DECFLOAT the lengths must be the same. Avoid Non-Equality JOINS - JOIN predicates that use comparison operators other than equality should be avoided because the join method is limited to nested loop. Also, the optimizer might not be able to compute an accurate selectivity estimate for the JOIN predicate. When a non-equality JOIN cannot be avoided, be sure an appropriate index exists on either table because the join predicates will be applied on the nested loop join inner. Dont Use Distinct Aggregations - the DISTINCT function causes a sort of the final result set, making it one of the more expensive sorts. Note that there are changes as of DB2 V9 where the optimizer will look to take advantage of an index to eliminate a sort for uniqueness as it currently does in optimizing with a GROUP BY statement today. Rewriting the SQL script using a GROUP BY or using a Sub SELECT (or IN predicate) will usually be more efficient. Also, avoid multiple DISTINCT aggregations [eg., SUM(distinct colx), AVG(distinct coly)] in the same SELECT as this becomes very expensive as the optimizer rewrites the original query into separate aggregations and SORTs, for each specifying DISTINCT keyword, and then combines the multiple aggregations using a UNION operation. Avoid Outer Joins Unless Necessary - The left outer join can prevent a number of optimizations, including the use of specialized star-schema join access methods. However, in some cases the left outer join can be automatically rewritten to an inner join by the query optimizer depending on the other predicates in the SQL script. Use of the inner equijoin is often more efficient so use it were possible. Tell Optimizer How Many Rows to Expect When the result set is know or can be closely estimated, use the OPTIMIZE FOR N ROWS clause along with FETCH FIRST N ROWS ONLY clause. OPTIMIZE FOR N ROWS clause indicates to the optimizer that the application intends to only retrieve N rows, but the query will return the complete result set. FETCH FIRST N ROWS ONLY clause indicates that the query should only return N rows. OPTIMIZE FOR N ROWS along with FETCH FIRST N ROWS ONLY, to encourage query access plans that return rows directly from the referenced tables, without first performing a buffering operation such as inserting into a temporary table, sorting or inserting into a hash join hash table. NOTE, that specify OPTIMIZE FOR N ROWS to encourage query access plans that avoid buffering operations, but retrieve all rows of the result set, could experience degraded performance. This is because the query access plan that returns the first N rows fastest might not be the best query access plan if the entire result set is being retrieved. Avoid Redundant Predicates - Eliminate duplicate predicates, especially when they occur across different tables. In some cases, the optimizer cannot detect that the predicates are redundant. This might result in cardinality underestimation and the selection of a suboptimal access plan. Review SQL script for columns with same data but different column

2008 Computer Sciences Corporation.

16

BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

names where the same tests are being performed. Again keep the predicates as simple as possible and remove the same test on similar columns wherever possible. Select Only the Columns Needed Avoid using SELECT * as you return all the columns for each row returned. This will cause more I/O processing and slow down SORTS with needless data. Also, dont select columns you know the value for in the SQL script which causes more unneeded data handling. For example, SELECT A, B,C WHERE C=1958 causes column C data to be processed needlessly. Also, dont select columns for sorting or grouping if these columns are not needed in the return data set. Select Only the Rows Needed Reducing the set of rows returned in a result set will make the query handle less data and run faster. Use row filter predicates to limit the rows of data being returned. When writing a SQL script with multiple predicates, determine the predicate that will filter out the most data from the result set and place that predicate at the start of the list. By sequencing your predicates in this manner, the subsequent predicates will have less data to filter and process. Use and INDEX in place of a SORT Creating an index on commonly sorted data columns could save a SORT of the result set.

5.5.4 REVIEW JOINS AND INDEXES Table joins should always use indexed columns whenever possible for best performance. Review the JOINS and columns used. Ideally use the Primary Key for at least one of the tables. Using index columns in the JOINS permits the optimizer to use the column statistics and index to determine the best access path and could reduce the I/O by using the index rather than the data from the table. The use of indexed columns in filtering predicates reduces the processing required and data handling by utilizing the indexes and index processing methods. 5.5.5 REVIEW ALL SELECTED COLUMNS AND TABLE INDEXES Selected columns should be reviewed as well as the JOIN columns. Needed columns to satisfy the query may be available in the index used for a table JOIN or an index used for accessing the table. If all of the selected columns are in an index, then I/O processing can be limited just to the index pages. This is known as Index-Only Read which is much more efficient then reading both the index and the data table. Note, UNIQUE indexes can have data columns INCLUDED in the index pages. This is very useful when the majority of needed columns are all ready in the index and another column or two is needed from the data row. If the row contains many columns, having all of the needed columns in an index becomes significantly more efficient than the alternative. 5.5.6 RETEST THE ENTIRE WORK LOAD AFTER SQL PERFORMANCE TUNING Making index changes while tuning individual SQL statements may have unplanned impact on other parts of a given workload. It is important to retest the entire workload after tuning the SQL statements individually. Use the recorded baselines to compare performance improvements. Compare the ending explain plans and estimated TIMERONS (unit of estimated run resource costs).
2008 Computer Sciences Corporation.

17

BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

5.5.7 DB2 INDEX ADVISOR DB2 has a tool to review and recommend INDEXES for a specified Query Workload. This tool reads a file of SQL Statements and generates a list of used and recommended indexes for that workload (or statement) as well as a list of unused indexes. The output of this tool specifies the percent of estimated performance improvement for each new recommended index and its expected size. Note, this tool may recommend a list of indexes to add for a given work load or statement. Adding indexes involves a tradeoff of storage space and processing time. Be very cautious when adding indexes. See the IBM DB2 Information Center for further details of this tool.
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.qb.dbconn.doc/doc/c0004770.html

db2advis - DB2 design advisor command


The DB2 Design Advisor advises users on the creation of materialized query tables (MQTs) and indexes, the repartitioning of tables, the conversion to multidimensional clustering (MDC) tables, and the deletion of unused objects. The recommendations are based on one or more SQL statements provided by the user. A group of related SQL statements is known as a workload. Users can rank the importance of each statement in a workload and specify the frequency at which each statement in the workload is to be executed. The Design Advisor outputs a DDL CLP script that includes CREATE INDEX, CREATE SUMMARY TABLE (MQT), and CREATE TABLE statements to create the recommended objects.

2008 Computer Sciences Corporation.

18

BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

5.6 Explain Tools


DB2 provides two tools for generating Explain Plans for a given SQL statement. These tools are useful for reviewing and tuning queries as they identify which indexes are being used and where table scans are being performed. 5.6.1 VISUAL EXPLAIN TOOL This tool is available from the DB2 Control Center and will display graphically the Explain Plan for the SQL statement specified. See the IBM DB2 Information Center for further details of this tool.
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.qb.dbconn.doc/doc/c0004 770.html

Visual Explain

Visual Explain lets you view the access plan for explained SQL or XQuery statements as a graph. You can use the information available from the graph to tune your queries for better performance. Important: Access to Visual Explain through the Control Center tools has been deprecated in Version 9.7 and might be removed in a future release. For more information, see Control Center tools have been deprecated. Accessing Visual Explain functionality through the Data Studio toolset has not been deprecated. You can use Visual Explain to:

View the statistics that were used at the time of optimization. You can then compare these statistics to the current catalog statistics to help you determine whether rebinding the package might improve performance. Determine whether or not an index was used to access a table. If an index was not used, Visual Explain can help you determine which columns might benefit from being indexed. View the effects of performing various tuning techniques by comparing the before and after versions of the access plan graph for a query. Obtain information about each operation in the access plan, including the total estimated cost and number of rows retrieved (cardinality).

An access plan graph shows details of:

Tables (and their associated columns) and indexes Operators (such as table scans, sorts, and joins) Table spaces and functions.

Note: Note that Visual Explain cannot be invoked from the command line, but only from various database objects in the Control Center. To start Visual Explain:

From the Control Center, right-click a database name and select either Show Explained Statements History or Explain Query. From the Command Editor, execute an explainable statement on the Interactive page or the Script page. From the Query Patroller, click Show Access Plan from either the Managed Queries Properties notebook or from the Historical Queries Properties notebook.

2008 Computer Sciences Corporation.

19

BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

5.6.2 DB2EXPLN FACILITY DB2 comes with a operating system level command to generate the Explain Plan for a given SQL statement. See the IBM DB2 Information Center for further details of this tool.
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.qb.dbconn.doc/doc/c0004770.html

SQL and XQuery explain tool


The db2expln command describes the access plan selected for SQL or XQuery statements. You can use this tool to obtain a quick explanation of the chosen access plan when explain data was not captured. For static SQL and XQuery statements, db2expln examines the packages that are stored in the system catalog. For dynamic SQL and XQuery statements, db2expln examines the sections in the query cache. The explain tool is located in the bin subdirectory of your instance sqllib directory. If db2expln is not in your current directory, it must be in a directory that appears in your PATH environment variable. The db2expln command uses the db2expln.bnd, db2exsrv.bnd, and db2exdyn.bnd files to bind itself to a database the first time the database is accessed.

Description of db2expln output Explain output from the db2expln command includes both package information and section information for each package.

Parent topic: Explain facility Related reference db2expln - SQL and XQuery Explain command

2008 Computer Sciences Corporation.

20

BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

6.0 Appendix

2008 Computer Sciences Corporation.

21

Worldwide CSC Headquarters The Americas 3170 Fairview Park Drive Falls Church, Virginia 22042 United States +1.703.876.1000 Europe, Middle East, Africa Royal Pavilion Wellesley Road Aldershot, Hampshire GU11 1PZ United Kingdom +44(0)1252.534000 Australia 26 Talavera Road Macquarie Park, NSW 2113 Australia +61(0)29034.3000 Asia 139 Cecil Street #06-00 Cecil House Singapore 069539 Republic of Singapore +65.6221.9095 About CSC The mission of CSC is to be a global leader in providing technology enabled business solutions and services. With the broadest range of capabilities, CSC offers clients the solutions they need to manage complexity, focus on core businesses, collaborate with partners and clients, and improve operations. CSC makes a special point of understanding its clients and provides experts with real-world experience to work with them. CSC is vendor-independent, delivering solutions that best meet each clients unique requirements. For more than 45 years, clients in industries and governments worldwide have trusted CSC with their business process and information systems outsourcing, systems integration and consulting needs. The company trades on the New York Stock Exchange under the symbol CSC.
2008 Computer Sciences Corporation.

You might also like