CA Performance Handbook

for DB2 for z/OS

About the Contributors from Yevich, Lawson and Associates Inc.
DAN LUKSETICH

is a senior DB2 DBA. He works as a DBA, application architect, presenter, author, and teacher. Dan has over 17 years working with DB2 as a DB2 DBA, application architect, system programmer and COBOL and BAL programmer — working on major implementations on z/OS, AIX, and Linux environments. His experience includes DB2 application design and architecture, database administration, complex SQL and SQL tuning, performance audits, replication, disaster recovery, stored procedures, UDFs, and triggers.

SUSAN LAWSON

is an internationally recognized consultant and lecturer with a strong background in database and system administration. She works with clients to help development, implement and tune some of the world’s largest and most complex DB2 databases and applications. She also performs performance audits to help reduce costs through proper performance tuning. She is an IBM Gold Consultant for DB2 and z/Series. She has authored the IBM ‘DB2 for z/OS V8 DBA Certification Guide’, DB2 for z/OS V7 Application Programming Certification Guide’ and ‘DB2 for z/OS V9 DBA Certification Guide’ — 2007. She also co-authored several books including ‘DB2 High Performance Design and Tuning’ and ‘DB2 Answers’ and is a frequent speaker at user group and industry events. (Visit DB2Expert.com)

About CA
CA (NYSE: CA), one of the world's largest information technology (IT) management software companies, unifies and simplifies the management of enterprise-wide IT for greater business results. Our vision, tools and expertise help customers manage risk, improve service, manage costs and align their IT investments with their business needs. CA Database Management encompasses this vision with an integrated and comprehensive solution for database design and modeling, database performance management, database administration, and database backup and recovery across multiple database systems and platforms.

Copyright © 2007 CA. All Rights Reserved. One CA Plaza, Islandia, N.Y. 11749. All trademarks, trade names, service marks, and logos referenced herein belong to their respective companies.

Table of Contents
About This Handbook
Originally known as “The DB2 Performance Tool Handbook” by PLATINUM Technologies (PLATINUM was acquired by Computer Associates in 1999), this important update provides information to consider as you approach database performance management, descriptions of common performance and tuning issues during design, development, and implementation of a DB2 for z/OS application and specific techniques for performance tuning and monitoring. • Chapter 1: Provides a overview information on database performance tuning • Chapter 2: Provides descriptions of SQL access paths • Chapter 3: Describes SQL tuning techniques • Chapter 4: Explains how to design and tune tables and indexes for performance • Chapter 5: Describes data access methods • Chapter 6: Describes how to properly monitor and isolate performance problem • Chapter 7: Describes DB2 application performance features • Chapter 8: Explains how to adjust subsystem parameters, and configure subsystem resources for performance • Chapter 9: Describes tuning approaches when working with packaged applications • Chapter 10: Offers tuning tips drawn from real world experience

Who Should Read This Handbook
The primary audiences for this handbook are physical and logical database administrators. Since performance management has many stakeholders, other audiences who will benefit from this information include application developers, data modelers and data center managers. For performance initiatives to be successful, DBAs, developers, and managers must cooperate and contribute to this effort. This is because there is no right and wrong when it comes to performance tuning. There are only trade-offs as people make decisions about database design and performance. Are you designing for ease of use, minimized programming effort, flexibility, availability, or performance? As these choices are made, there are different organizational costs and benefits whether they are measured in the amount of time and effort by the personnel involved in development and maintenance, response time for end users, or CPU metrics. This handbook assumes a good working knowledge of DB2 and SQL, and is designed to help you build good performance into the application, database, and the DB2 subsystem. It provides techniques to help you monitor DB2 for performance, and to identify and tune production performance problems.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS i

Page intentionally left blank .

It also provides the information necessary to design databases for high performance. • Throughout this handbook. even at a high level.CHAPTER 1 Introducing DB2 Performance Many people spend a lot of time thinking about DB2 performance. performance is on everyone’s mind when one of two things happen: an online production system does not respond within the time specified in a service level agreement (SLA). Using This Handbook This handbook provides important information to consider as you approach database performance management. For database performance management. databases. and to tune those processes for performance. development. and extends through (database and application) development and production implementation. to understand how DB2 processes SQL and transactions. This handbook aims to provide information that performance fire fighters and performance tuners need. Nonetheless. Performance tuning can be categorized as follows: • Designing for performance (application and database) • Testing and measuring performance design efforts • Understanding SQL performance and tuning SQL • Tuning the DB2 subsystem • Monitoring production applications An understanding of DB2 performance. the philosophy of performance tuning begins with database design. and subsystems. and once the problem is solved the tuning effort is abandoned. and implementation of a DB2 for z/OS application and specific techniques for performance tuning and monitoring. while others spend no time at all on the topic. descriptions of common performance and tuning issues during design. Most of the efforts involved in tuning during these situations revolve around fire fighting. goes a long way to ensure well performing applications. you’ll find: • Fundamentals of DB2 performance • Descriptions of features and functions that help you manage and optimize performance • Methods for identifying DB2 performance problems • Guidelines for tuning both static and dynamic SQL • Guidelines for the proper design of tables and indexes for performance CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 1 . or an application uses more CPU (and thus costs more money) than is desired.

If someone tells you they want everything to run as fast as possible. measuring. in order for effective performance tuning to take place. Second. Most people don’t inherently care about performance. it means that the organization using the application is not spending too much money running the application. then you should tell them that the solution is to buy an infinite number of machines. however. Only then will the full benefits of performance tuning be realized. They want their data. it means that users do not need to wait for the information they require to do their jobs. and at the lowest cost possible. once they get these things they are also interested in getting the data fast. Good performance is not something that is typically delivered at the push of a button. That is. The methods for designing. good performance is a balance between expectations and reality. and tuning for performance vary from application to application. Getting the best possible performance from DB2 databases and applications means a number of things. and advice for subsystem configuration and tuning • Commentary for proper application design for performance • Real-world tips for performance tuning and monitoring The following chapters discuss how to fix various performance problems: • Chapter 2 provides descriptions of SQL access paths • Chapter 3 describes SQL tuning techniques • Chapter 4 explains how to design and tune tables and indexes for performance • Chapter 5 describes data access methods • Chapter 6 describes how to properly monitor and isolate performance problems • Chapter 7 describes DB2 application performance features that you can take advantage of • Chapter 8 explains how to adjust subsystem parameters. they want it now. In reality. they want it accurate.com . First and foremost. they want it all the time. Of course. Enter performance design and tuning. and they want it to be secured. proper expectations and efforts should be identified and understood. Is it is an online transaction processing system (OLTP)? A data warehouse? A batch processing system? 2 ca.INTRODUCING DB2 PERFORMANCE • DB2 subsystem performance information . The optimal approach depends upon the type of application. and configure subsystem resources for performance • Chapter 9 describes actions that can be taken when working with packaged applications • Chapter 10 offers tuning tips drawn from experience at a variety of implementations Measuring DB2 Performance Performance is a relative term.

5 and 6.) Designing for Performance An application can be designed and optimized for high availability. Major performance problems are caused when performance is entirely overlooked during design. you face the challenge of finding the cause of the problem. ease of programming. or alleviate performance problems. recoding. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 3 . flexibility and reusability of code.) Possible Causes of DB2 Performance Problems When the measurements you use at your site warn you of a performance problem. such as buffers and logs In other situations you may be dealing with packaged software applications that cause performance problems. Some common causes of performance problems are: • Poor database design • Poor application design • Catalog statistics that do not reflect the data size and/or organization • Poorly written SQL • Generic SQL design • Inadequately allocated system resources. Proper techniques (and tools) can turn performance engineering research and testing from a lengthy process inhibiting development to one that can be performed efficiently prior to development. and retrofitting during implementation in order to satisfy performance expectations. Proactive performance engineering allows you to analyze and stress test the designs before they are implemented. 3. Designing an application with regards to performance will help avoid many performance problems after the application has been deployed. is considered late in the design cycle or is not properly understood. Proactive performance engineering can help eliminate redesign. The most thorough application design should take into account all of these things. or performance. tested and addressed prior to production.INTRODUCING DB2 PERFORMANCE Your site may use one or all of the following to measure performance: • User complaints • Overall response time measurements • Service level agreements (SLAs) • System availability • I/O time and waits • CPU consumption and associated charges • Locking time and waits (See Chapters 2. In these situations it may be difficult to tune the application but quite possible to improve performance through database and subsystem tuning. (See Chapter 9.

or the catalog statistics don’t accurately reflect the data in the tables? It can also be an issue of dynamic SQL.INTRODUCING DB2 PERFORMANCE There are many options for proper index and table design for performance.) 4 ca. as well as the units of work (UOWs) and inputs. perhaps our data has become disorganized. However.) Finding and Fixing Performance Problems How do we find performance problems? Are SQL statements in an application causing problems? Does the area of concern involve the design of our indexes and tables? Or. finding and fixing the worst performing SQL statements is the top priority. Taking these things into consideration. read versus update. or subsystem? With the proper monitoring in place.com . the problem should be evident. of course. This involves understanding the application data access patterns. (See Chapters 2-4. is relevant.). etc. or the ones with matching index access? Performance. or perhaps the overhead of preparing the dynamic SQL statements. and finding the least efficient statement is a matter of understanding how that statement is performing relative to the business need that it serves. (See Chapters 4 and 7. table space. as described in Chapter 6. what is the best approach to address it? Is it a matter of updating catalog statistics? Tuning an SQL statement? Adjusting parameters for indexing. 7-10. and poor performing statements.) Fixing the Performance Problems Once a performance problem has been identified. When it comes to database access. In addition. the application as well as to be properly designed for performance. and designing the application and database accordingly is the best defense against performance problems. (See Chapter 6. Proper action will be the result of properly understanding the problem first. how does one prioritize the worst culprits? Are problems caused by the statements that are performing table space scans. These options have to do specifically with the type of application that is going to be implemented. specifically with the type of data access (random versus sequential.

and how your data is accessed will help you understand and adjust DB2 performance. portability across platforms. VSAM and QSAM are generally cheaper than DB2. computers fool us in that in spite of their stupidity they are incredibly fast. This speed enables an incredible number of calculations to be performed in a fraction of a second. along with the best performance possible? Yes. is that DB2 offers a level of abstraction from the data. can we possibly get the flexibility and power of DB2. It uses catalog statistics to attach a cost to the various possible access paths. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 5 . we don’t need to know the access method for the data. portability. So. increasing productivity. The DB2 optimizer is no exception. All we need to know is the name of the table. Using DB2 for z/OS can be more expensive than more traditional programming and access methods on the z/OS platform. and leverage goes a long way in the business world. and the trade-offs between performance. DB2 takes care of the rest. checks it against the DB2 system catalog. flat file processing can be cheaper than DB2. the DB2 access paths. All we need is an understanding of the DB2 optimizer. Yes. This means leverage. and people costs.CHAPTER 2 SQL and Access Paths One of the most important things to understand about DB2. Yes. and all the power and flexibility that comes with a relational or object-relational design. understanding the optimizer. The DB2 Optimizer There is no magic to computer programming. we can. with the increased cost of using DB2 over long standing traditional access methods on the z/OS platform. and we don’t need to know what index to use. Computers are stupid. flexibility. the SQL language. What we pay in return is more consumption of CPU and I/O. This is a significant business advantage in that we can quickly build applications that access data via a standardized language that is independent of any access method or platform. The DB2 optimizer (in a simplistic view) simply accepts a statement. and then chooses the cheapest path. They only understand zeros and ones. with continually shrinking people resources and skill sets on the z/OS platform. utilizing a technology such as DB2 goes a long way in saving programming time. and portability. and then builds access paths to the data. checks it for syntax. and the columns we are interested in. We don’t need to know where the datasets are. Not to insult the incredible efforts that the IBM DB2 development team has performed in making the DB2 optimizer the most accurate and efficient engine for SQL optimization on the planet! However. However. This enables ultra-fast development of applications. sometimes! However. We don’t have to know where exactly the data physically exists. and DB2 performance.

and runtime optimization options will dictate which statistics are used. number of processors. such as central processor model. or transaction patterns. That is. It also can only deal with the information that it is provided. and application tuning can be conducted. it can only utilize the best of the available access paths. and buffer pool sizes are used to determine the access path. It should be noted that catalog statistics and database design are not the only things the DB2 optimizer uses to calculate and choose an efficient access path. and size of the RID pool. database. and are typically collected by the DB2 RUNSTATS utility. or anything about the possible inputs to our queries (unless all literal values are provided in a query). It does not know what access paths are possible that are not available. The DB2 optimizer is cost based. and determining how to access your data in the most efficient manner. and table access sequence for multiple table access paths. The optimizer uses a variety of statistics to determine the cost of accessing the data in tables. For this reason maintaining accurate up to date statistics is critical to performance. various installation parameters. The type of query (static or dynamic). You should be aware of these factors as you tune your queries. However. It doesn’t know about indexes that could be built. and so this chapter serves only as an education into the optimizer. statistics.SQL AND ACCESS PATHS The optimizer is responsible for interpreting your queries. Each provides different details about the data in your tables. These topics are covered in the remainder of this guide. This is primarily the information stored in the DB2 system catalog. 6 ca. and the organization of that data in DB2 tables. and is design to build a basis upon which proper SQL. or histogram statistics. the access path. This is why an understanding of the DB2 optimizer. The cost associated with various access paths will affect whether or not indexes are chosen. There are three flavors to DB2 catalog statistics. indexes and statistics. it is dependent upon the information we give it. columns. batch sequences. use of literal values. The Importance of Catalog Statistics It is possible to maintain information about the amount of data. input files. but also important is the design of applications and databases for performance. These statistics can be cardinality statistics. frequency distribution statistics. or which indexes are chosen. Additional information. which includes the basic information about the tables. and DB2 will utilize these statistics to determine the best way to access the data dependent upon the objects and the query. statistics. and access paths is very important. and so catalog statistics are critical to proper performance. and access paths. These are the DB2 catalog statistics.com .

For example.333. 1/COLCARDF. Column and table cardinalities provide the basic.000 employees then the estimated number of rows returned from the query will be 3. then we know there are 3 unique occurrences of a value of SEX in the table. and when joining tables which table to access first. Cardinality statistics reflect the number of rows in a table. It will use the catalog statistics to determine the filter factor for the SEX column. the filter factor.SQL AND ACCESS PATHS Cardinality Statistics DB2 needs an accurate estimate of the number of rows that qualify after applying various predicates in your queries in order to determine the optimal access path. or table space scan. The resulting fractional number represents the number of rows that are expected to be returned based upon the predicate. to make decisions as to whether of not to use an index (if available). as reflected in the CARDF column of the SYSIBM. DB2 will use this number. as well as the cardinality of the SEX column. If the predicate in the query above is used in a join to another table. is 10. or 33% of the values. In this case. but it could also influence which table will be accessed first in the join. if the column cardinality of the SEX column of the EMP table is 3 (male. but critical information for the optimizer to make these estimates. then the filter factor calculated will be used not only to determine the access path to the EMP table. If the table cardinality of the table. DB2 will use this information to make decisions about which access path to choose. When multiple tables are accessed the number of qualifying rows estimated for the tables can also affect the table access sequence. In this particular case. a percentage of the number of rows expected to be returned. for the following statement embedded in a COBOL program: SELECT EMPNO FROM WHERE EMP SEX = :SEX DB2 has to choose what is the most efficient way to access the EMP table. female. for a column or a table. COLCARDF being a column in the SYSIBM.SYSTABLES catalog table. unknown). DB2 determines a filter factor of 1/3. DB2 will use these statistics to determine such things as to whether or not access is via an index scan or a table space scan.SYSCOLUMNS catalog table. or the number of distinct values for a column in a table. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 7 . and any index that might be defined upon the table (especially one on the SEX column). These statistics provide the main source of what is known as the filter factor. This will depend upon the size of the table. For example.

For example.EMPNO FROM WHERE AND EMP E SALARY > :SALARY EDLEVEL = :EDLEVEL Now. This is very important to know because again the DB2 optimizer can only deal with the information it is given.WORKDEPT = D.DEPTNO. and could negatively influence the choice of index.SQL AND ACCESS PATHS SELECT E.DEPTNAME FROM EMP E INNER JOIN DEPT D ON WHERE E. This may result in an exaggerated filter factor for the table. imagine in a truly hypothetical example that the amount of money that someone is paid is correlated to the amount of education they have received.EMPNO. More conservatively. it may be wise to gather column correlation statistics on any columns referenced together in the WHERE clause. which is otherwise known as columns correlation statistics. if the DB2 optimizer is not given this information it has to multiply the filter factor for the SALARY predicate by the filter factor for the EDLEVEL predicate. for example you have the following query: SELECT E. However. it is important to gather column correlation statistics on any columns that might be correlated. Say.DEPTNO LASTNAME = :LAST-NAME In addition to normal cardinality statistics. D. statistics can be gathered upon groups of columns.com . For this reason. 8 ca. an intern being paid minimum wage won’t be paid as much as an executive with an MBA. or the table access sequence of a join. D.

however it is quite common that the values in the SEX column will be ‘M’ for male. Likewise.SQL AND ACCESS PATHS Frequency Distribution Statistics In some cases cardinality statistics are not enough. Skewed distributions can have a negative influence on query performance if DB2 does not know about the distribution of the data in a table and/or the input values in a query predicate. and so it uses the default formula of 1/COLCARDF. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 9 . With only cardinality statistics the DB2 optimizer has to make assumptions about the distribution of data in a table. In the following query: SELECT EMPNO FROM WHERE EMP SEX = :SEX DB2 has no idea what the input value of the :SEX host variable is. Take. This is where distribution statistics can pay off. with each representing approximately 50% of the values. The column cardinality is 3. and DB2 has only cardinality statistics available. or ‘F’ for female. then the same formula and filter factor applies. if the following statement is issued: SELECT EMPNO FROM WHERE EMP SEX = ‘U’ and most of the values are ‘M’ or ‘F’. the SEX column from the previous queries. That is. and the value ‘U’ occurs only a small fraction of the time. This is an example of what we will call skewed distribution. for example. it has to assume that the data is evenly distributed.

49 .49 . if we gathered frequency distribution statistics for our SEX column in the previous examples. we may find the following frequency values in the SYSIBM. if DB2 has this information. That is.SYSCOLDIST table (simplified in this example): VALUE FREQUENCY ‘M’ ‘F’ ‘U’ .com . This additional information can have dramatic effects on access path and table access sequence decisions. These frequency distribution statistics can have a dramatic impact on the performance of the following types of queries: • Dynamic or static SQL with embedded literals • Static SQL with host variables bound with REOPT(ALWAYS) • Static or dynamic SQL against nullable columns with host variables that cannot be null • Dynamic SQL with parameter markers bound with REOPT(ONCE). REOPT(ALWAYS).SQL AND ACCESS PATHS DB2 has the ability to collect distribution statistics via the RUNSTATS utility.01 Now. you can collect the information about the percentage of values that occur most or least frequently in a column in a table. and is provided with the following query: SELECT EMPNO FROM WHERE EMP SEX = ‘U’ It can determine that the value ‘U’ represents about 1% of the values of the sex column. or REOPT(AUTO) So. 10 ca. These statistics reflect the percentage of frequently occurring values.

This can go beyond the distribution statistics in that all values are represented. of course. that we needed to query on the middle initial of person’s names. If we calculated histogram statistics for the middle initial. If it is not. the previous example is no good because we had only three values for the SEX column of the EMP table. we’ll use a hypothetical example with our EMP sample table.SQL AND ACCESS PATHS Histogram Statistics Histogram statistics improve on distribution statistics in that they provide value-distribution statistics that are collected over the entire range of values in a table. The quantile defines an interval of values The quantity of intervals is determined via a specification of quantiles in a RUNSTATS execution. Histogram statistics divide values into quantiles. If we had distribution statistics then any predicate that used a literal value could take advantage of the distribution of values to determine the best access path. then the filter factor is determined based upon the difference in frequency of the remaining values. collected all the time for every single column which can be extremely expensive and difficult). histogram statistics summarize data distribution on an interval scale. Let’s suppose. Thus a quantile will represent a range of values within the table. Histogram statistics eliminate this guesswork. This goes beyond the capabilities of distribution statistics in that distribution statistics are limited to only the most or least occurring values in a table (unless they are. they may look something like this: QUANTILE LOWVALUE HIGHVALUE CARDF FREQ 1 2 3 4 5 A H M R V G L Q U Z 5080 4997 5001 4900 5100 20% 19% 20% 19% 20% CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 11 . If we had only cardinality statistics then any predicate referencing the middle initial column would receive a filter factor of 1/COLARDF or 1/26. Now. Those initials are most likely in the range of ‘A’ to ‘Z’. That determination would be dependent upon the value being recorded as one of the most or least occurring values. To be more specific. theoretically. Each quantile will contain approximately the same percentage of rows. So. and span the entire distribution of values in a table.

SQL AND ACCESS PATHS

Now, the DB2 optimizer has information about the entire span of values in the table, and any possible skew. This is especially useful for range predicates such as this:

SELECT EMPNO FROM WHERE EMP MIDINIT BETWEEN ‘A’ and ‘G’

Access Paths
There are a number of access paths that DB2 can choose when accessing the data in your database. This section describes those access paths, as well as when they are effective. The access paths are exposed via use of the DB2 EXPLAIN facility. The EXPLAIN facility can be invoked via one of these techniques: • EXPLAIN SQL statement • EXPLAIN(YES) BIND or REBIND parameter • Visual Explain (DB2 V7, DB2 V8, DB2 9) • Optimization Service Center (DB2 V8, DB2 9) Table Space Scan If chosen, DB2 will scan an entire table space in order to satisfy a query. What is actually scanned depends upon the type of table space. Simple table spaces (supported in DB2 9, but you can’t create them) will be scanned in their entirety. Segmented and universal tablespaces will be scanned for only the table specified in the query (or for the spacemaps that qualify). Nonetheless, a table space scan will search the entire table space to satisfy a query. A table space scan is indicated by an “R” in the PLAN_TABLE ACCESSTYPE column. A table space scan will be selected as an access method when: • A matching index scan is not possible because no index is available, or no predicates match the index column(s) • DB2 calculates that a high percentage of rows will be retrieved, and in this case any index access would not be beneficial • The indexes that have matching predicates have a low cluster ratio and are not efficient for large amounts of data that the query is requesting

12

ca.com

SQL AND ACCESS PATHS

A table space scan is not necessarily a bad thing. It depends upon the nature of the request and the frequency of access. Sure, running a tables pace scan every one second in support of online transactions is probably a bad thing. A table space scan in support of a report that spans the content of an entire table once per day is probably a good thing. Limited Partition Scan and Partition Elimination DB2 can take advantage of the partitioning key values to eliminate partitions from a table space scan, or via index access. DB2 utilizes the column values that define the partitioning key to eliminate partitions from the query access path. Queries have to be coded explicitly to eliminate partitions. This partition elimination can apply to table space scans, as well as index access. Suppose, for example, that the EMP table is partitioned by the EMPNO column (which it is in the DB2 sample database). The following query would eliminate any partitions not qualified by the predicate:

SELECT EMPNO, LASTNAME FROM WHERE EMP EMPNO BEWTEEN ‘200000’ AND ‘299999’

DB2 can choose an index based access method if there was an index available on the EMPNO column. DB2 could also choose a table space scan, but utilize the partitioning key values to access only the partition in which the predicate matches. DB2 can also choose a secondary index for the access path, and still eliminate partitions using the partitioning key values if those values are supplied in a predicate. For example, in the following query:

SELECT EMPNO, LASTNAME FROM WHERE AND EMP EMPNO BEWTEEN ‘200000’ AND ‘299999’ WORKDEPT = ‘C01’

DB2 could choose a partitioned secondary index on the WORKDEPT column, and eliminate partitions based upon the range provided in the query for the EMPNO.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 13

SQL AND ACCESS PATHS

DB2 can employ partition elimination for predicates coded against the partitioning key using: • Literal values (DB2 V7, DB2 V8, DB2 9) • Host variables (DB2 V7 with REOPT(VARS), DB2 V8, DB2 9) • Joined columns (DB2 9) Index Access DB2 can utilize indexes on your tables to quickly access the data based upon the predicates in the query, and to avoid a sort in support of the use of the DISTINCT clause, as well as for the ORDER BY and GROUP BY clauses, and INTERSECT, and EXCEPT processing. There are several types of index access. The index access is indicated in the PLAN_TABLE by an ACCESSTYPE value of either I, I1, N, MX, or DX. DB2 will match the predicates in your queries to the leading key columns of the indexes defined against your tables. This is indicated by the MATCHCOLS column of the PLAN_TABLE. If the number of columns matched is greater than zero, then the index access is considered a matching index scan. If the number of matched columns is equal to zero, then the index access is considered a non-matching index scan. In a non-matching index scan all of the key values and their record identifiers (RIDs) are read. A non-matching index scan is typically used if DB2 can use the index in order to avoid a sort when the query contains an ORDER BY, DISTINCT, or GROUP BY clause, and also possibly for an INTERSECT, or EXCEPT. The matching predicates on the leading key columns are equal (=) or IN predicates. This would correspond to an ACCESSTYPE value of either “I” or “N”, respectively. The predicate that matches the last index column can be an equal, IN, NOT NULL, or a range predicate (<, <=, >, >=, LIKE, or between). For example, for the following query assume that the EMPPROJACT DB2 sample table has an index on the PROJNO, ACTNO, EMSTDATE, and EMPNO columns:

SELECT EMENDATE, EMPTIME FROM WHERE AND AND AND EMPPROJACT PROJNO = ‘AD3100’ ACTNO IN (10, 60) EMSTDATE > ‘2002-01-01’ EMPNO = 000010

14

ca.com

“MI”. If all of the columns specified in a statement can be found in an index. the EMPNO column can be applied as an index screening column. or “DI”). and then sorting them in data page number sequence. This is indicated via an ACCESSTYPE of “M” in the PLAN_TABLE. and therefore reduce the number of data rows that have to be retrieved from the table. This is indicated in the PLAN_TABLE with the value “Y” in the INDEXONLY column. or the same index multiple times. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 15 . DB2 is also able to access indexes via multi-index access. In this way a table can still be accessed efficiently for more complicated queries. A multi-index access involves reading multiple indexes. It does not match on the EMPNO column due to the fact that the previous predicate is a range predicate. However. union the resulting RID values together (do to the OR condition in the WHERE clause). ACTNO. That is. then DB2 may choose index only access. and EMSTDATE columns.SQL AND ACCESS PATHS In the above example DB2 could choose a matching index scan matching on the PROJNO. So. These steps include the matching index access (ACCESTYPE = “MX” or “DX”) for regular indexes or DOCID XML indexes. and then access the data. and then unions or intersections of the rids (ACCESSTYPE = “MU”. The following example demonstrates a SQL statement that can perhaps utilize multi-index access if there is an index on the LASTNAME and FIRSTNME columns of the EMP table. since it is a stage 1 predicate (predicate stages are covered in Chapter 2) DB2 can apply it to the index entries after the index is read. SELECT EMPNO FROM WHERE OR EMP (LASTNAME = ‘HAAS’ AND FIRSTNME = ‘CHRISTINE’) (LASTNAME = ‘CHRISTINE’ AND FIRSTNME = ‘HAAS’) With the above query DB2 could possibly access the index twice. “DU”. gathering qualifying RID values together in a union or intersection of values (depending upon the predicates in the SQL statement). With multi-index access each operation is represented in the PLAN_TABLE in an individual row. although the EMPNO column does not limit the range of entries retrieved from the index it can eliminate the entries as the index is read.

which is a common area of memory. a list prefetch is useful for access to a moderate number of rows when access to the table is via a non-clustering or clustering index when the data is disorganized (low cluster ratio). when the table is accessed via the clustering index. List prefetch will not be chosen as the access path if DB2 determines that the RIDs to be processed will take more than 50% of the RID pool when the query is executed. Accessing the data in a table via a non-clustering index. then the access to the data can be very random and unpredictable.com . In addition. The RIDs are then sorted in ascending order by the page number. It is usually less useful for queries that process large quantities of data or very small amounts of data. is by utilizing a clustered index. and collect the RIDs into the RID pool. Nested Loop Join In a nested loop join DB2 will scan the outer table (the first table accessed in the join operation) via a table space scan or index access. In general. List prefetch is indicated in the PLAN_TABLE via a value of “L” in the PREFETCH column. list prefetch can terminate during execution if DB2 determines that more than 25% of the rows of the table must be accessed. and the pages in the table space are then retrieved in a sequential or skip sequential fashion. When an index is defined as clustered. then DB2 will attempt to keep the data in the table in the same sequence as the key values in the clustering index. Also. Nested loop join is indicated via a value of “1” in the METHOD column of the PLAN_TABLE. In these situations (called RDS failures) the access path changes to a table space scan. 16 ca. then DB2 may choose an index access method called list prefetch. List prefetch will also be used as part of the access path for multi-index access and access to the inner table of a hybrid join. and then for each row in that table that qualifies DB2 will then search for matching rows in the inner table (the second table accessed). Therefore. then the access to the data in the table will typically be sequential and predictable. List prefetch will access the index via a matching index scan of one or more indexes. List prefetch will not be used if the matching predicates include an IN-list predicate. or via a clustering index that has a low cluster ratio.SQL AND ACCESS PATHS List Prefetch The preferred method of table access. When DB2 detects that access to a table will be via a non-clustering index. and the data in the table is well organized (as represented by the catalog statistics). the same data pages could be read multiple times. It will concatenate any rows it finds in the inner table with the outer table. or via the clustering index with a low cluster ratio (the table data is disorganized). when access is via an index.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 17 . In a hybrid join DB2 will scan the outer table either via a table space scan or via an index. and then join the qualifying rows of the outer table with the RIDs from the matching index entries. for all the qualifying RIDs of the inner table. In this way the join operation can be a sequential and efficient process. If DB2 detects that the inner table access will be via a non-clustering index or via a clustering index in which the cluster ratio is low. and for the best performance that index should be a clustering index in the same order as the outer table. In these cases it is likely that DB2 will choose a nested loop join as the access method for small amounts of data. or less than a very large amount of data (merge scan is better in this case). Thus. and do not specify join columns. and join the inner table to the intermediate table. The nested loop join is the method selected by DB2 when you are joining two tables together. as it does for list prefetch. or a restart of the list prefetch processing upon subsequent accesses (all within the same query).SQL AND ACCESS PATHS A nested loop join will repeatedly access the inner table as rows are accessed in the outer table. Then DB2 will sort the outer table and RIDs. the hybrid join can experience RID list failures. then DB2 may choose a hybrid join as the access method. It is important that there exist an index that supports the join on the inner table. Hybrid join can out perform a nested loop join when the inner table access is via a nonclustering index or clustering index for a disorganized table. hybrid join is better when the query processes more than a tiny amount of data (nested loop is better in this case). the hybrid join is better if the equivalent nested loop join would be accessing the inner table in a random fashion. DB2 also creates a RID list. The hybrid join is indicated via a value of “4” in the METHOD column of the PLAN_TABLE. DB2 (DB2 9) can also dynamically build a sparse index on the inner table when no index is available on the inner table. and thus accessing the same pages multiple times. Therefore. DB2 will then access the inner table via list prefetch. If you are experiencing RID list failures in a hybrid join then you should attempt to encourage DB2 to use a nested loop join. and is a good access path for transactions that are processing little or no data. Hybrid join takes advantage of the list prefetch processing on the inner table to read all the data pages only once in a sequential or skip sequential manner. the nested loop join is most efficient when a small number of rows qualify for the outer table. Since list prefetch is utilized. Hybrid Join The preferred method for a join is to join two tables together via a common clustering index. This can result in a table space scan of the inner table on initial access to the inner table. It may also sort the outer table if it determines that the join columns are not in the same sequence as the inner table. or the join columns of the outer table are not in an index or are in a clustering index where the table data is disorganized. That is. creating a sorted RID list and an intermediate table. This is especially true if the query will be retrieving more than a trivial amount of data.

and the outer table will be placed into a work file if it has to be sorted. Star Join A star join is another join method that DB2 will employ in special situations in which it detects that the query is joining tables that are a part of a star schema design of a data warehouse. DB2 V8) of the DB2 Performance Monitoring and Tuning Guide (DB2 9). The inner table is placed into a work file. DB2 will then read both tables. In a merge scan join DB2 will scan both tables in the order of the join columns. That is. then DB2 may choose a merge scan join as the access method. Sequential prefetch is indicated via a value of “S” in the PREFETCH column of the PLAN_TABLE. then the prefetch quantity is limited due to the risk of filling up the buffer pool. If the buffer pool has more than 1000 pages then 32 pages can be read with a single physical I/O. Merge scan join is best for large quantities of data. DB2 9 will only use sequential prefetch for a table space scan. When the buffer pool is smaller than 1000 pages. This hopefully allows the data pages to already be in the buffers in expectation of the query accessing them. Sequential Prefetch When your query reads data in a sequential manner. The maximum number of pages read by a sequential prefetch operation is determined by the size of the buffer pool used. These performance enhancers can have a dramatic effect on the performance of your queries and applications.SQL AND ACCESS PATHS Merge Scan Join If DB2 detects that a large number of rows of the inner and outer table will qualify for the join. If there are no efficient indexes providing the join order for the join columns. the inner table.com . The general rule should be that if you are accessing small amounts of data in a transaction then nested loop join is best. and join the rows together via a match/merge process. Processing large amounts of data in a batch program or a large report query? Then merge scan join is generally the best. or the tables have no indexes on the matching columns of the join. DB2 9 will rely primarily on dynamic prefetch. A merge scan join is indicated via a value of “2” in the METHOD column of the PLAN_TABLE. DB2 can elect to use sequential prefetch to read the data ahead of the query requesting it. 18 ca. DB2 can launch asynchronous read engines to read data from the index and table page sets into the DB2 buffers. The star join is best described in the DB2 Administration Guide (DB2 V7. The star join is indicated via the value of “S” in the JOIN_TYPE column of the PLAN_TABLE. Read Mechanisms DB2 can apply some performance enhancers when accessing your data. Otherwise. then DB2 may sort the outer table. or both tables.

or in join or subquery processing. by placing the sort in the program you remove some of the flexibility. power. A page is considered page-sequential if it is within one half of the prefetch quantity for the buffer pool. It employs a technique called sequential detection in order to detect the forward reading of table pages and index leaf pages. It should be noted that dynamic prefetch is dependent upon the bind parameter RELEASE(DEALLOCATE). This is called dynamic prefetch. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 19 . it is not five of eight sequential pages that activates dynamic prefetch. Typically this is happening for transactions that are accessing the index via a common cluster with other tables. to cluster tables that are commonly accessed together to possibly take advantage of sequential detection. or in support of certain access paths. DB2 will also indicate in an access path if a query is likely to get dynamic prefetch at execution time. DB2 will track the pattern of pages accessed in order detect sequential or near sequential access. The most recent eight pages are tracked. If it is then DB2 will not read from the root page of the index down to the leaf page. This is indicated via a value of “D” in the PREFETCH column of the PLAN_TABLE. This could improve the performance of applications that issue multiple SQL statements that access the data as a whole sequentially. It should also be noted that RELEASE(DEALLOCATE) is not effective for remote connections. and portability of the SQL statement. but simply use the values from the leaf page. but instead five of eight sequentially available pages. Sorting DB2 may have to invoke a sort in support of certain clauses in your SQL statement. and so sequential detection and dynamic prefetch are less likely for remote applications. Index lookaside depends on the query reading index data in a predictable pattern. This can have a dramatically positive impact on performance in that getpage operations and I/O’s can be avoided. EXCEPT. Index Lookaside Index lookaside is another powerful performance enhancer that is active only after an initial access to an index. along with the ease of coding. Index lookaside is also dependent upon the RELEASE(DEALLOCATE) bind parameter. In general. This is because the area of memory used to track the last eight pages accessed is destroyed and rebuilt for RELEASE(COMMIT). to remove duplicates (DISTINCT. It is also important. Therefore. However. it can also deactivate dynamic prefetch if it determines that the query or application is no longer reading sequentially. As DB2 can activate dynamic prefetch. much in the same way as sequential detection.SQL AND ACCESS PATHS Dynamic Prefetch and Sequential Detection DB2 can perform sequential prefetch dynamically at execution time. Perhaps instead you can take advantage of the ways in which DB2 can avoid a sort. For repeated probes of an index DB2 will check the most current index leaf page to see if the key is found on that page. DB2 can sort in support of an ORDER BY or GROUP BY clause. and data access is determined to be sequential if more than 4 of the last eight pages are pagesequential. or for repetitive queries within a single application thread. of course. sorting in an application will always be faster than sorting in DB2 for small result sets. Dynamic prefetch can be activated for single cursors. or INTERSECT).

LASTNAME. LASTNAME. Suppose for the following query we have an index on (WORKDEPT. So. DB2 can also utilize something called order by pruning to avoid a sort. FIRSTNME In the above query the sort will be avoided if DB2 uses the index on those three columns to access the table.SQL AND ACCESS PATHS Avoiding a Sort for an ORDER BY DB2 can avoid a sort for an ORDER BY by utilizing the leading columns of an index to match the ORDER BY columns if those columns are the leading columns of the index used to access a single table.com . FIRSTNME): SELECT * FROM WHERE EMP WORKDEPT = ‘D01’ ORDER BY WORKDEPT. DB2 can also avoid a sort if any number of the leading columns of an index match the ORDER BY clause. both of the following queries will avoid a sort if the index is used: SELECT * FROM EMP ORDER BY WORKDEPT SELECT * FROM WHERE EMP WORKDEPT = ‘D01’ ORDER BY WORKDEPT. or the leading columns of the index to access the first table in a join operation. LASTNAME 20 ca. or if the query contains an equals predicate on the leading columns (ORDER BY pruning).

FIRSTNME. COUNT(*) FROM EMP GROUP BY WORKDEPT SELECT WORKDEPT. LASTNAME. and any missed leading columns must be in equals or IN predicates in the WHERE clause. FIRSTNAME SELECT WORKDEPT. FIRSTNME. LASTNAME. Assuming again an index on (WORKDEPT. COUNT(*) FROM WHERE EMP LASTNAME = ‘SMITH’ GROUP BY WORKDEPT. COUNT(*) FROM EMP GROUP BY WORKDEPT. each of the following statements can avoid a sort if the index is used: SELECT WORKDEPT. FIRSTNME). as well as a full or partial index key to avoid a sort.SQL AND ACCESS PATHS Avoiding a Sort for a GROUP BY A group by can utilize a unique or duplicate index. LASTNAME. The grouped columns must match the leading columns of the index. FIRSTNAME CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 21 .

LASTNAME. FIRSTNME FROM EMP SELECT WORKDEPT. The response time for these data or processor intensive queries can be dramatically reduced. A sort could be avoided if there is a matching index with the unioned/intersected/excepted columns.com . “are duplicates possible?” If not. you have to ask yourself. Sort cannot be avoided for a UNION. LASTNAME. 22 ca. If so. Avoiding a Sort for a UNION. or if it realizes that the inputs to the operation are already sorted by a previous operation. query I/O parallelism and query CP parallelism. FIRSTNME FROM EMP GROUP BY WORKDEPT. You could possibly utilize a GROUP BY on all of the columns in the SELECT list to take advantage of the additional capabilities of GROUP BY to avoid a sort.SQL AND ACCESS PATHS Avoiding a Sort for a DISTINCT A DISTINCT can avoid a sort by utilizing only a unique index if that index is used to access the table. FIRSTNME When using this technique. LASTNAME. or INTERSECT DB2 can possibly avoid a sort for an EXCEPT or INTERSECT. Parallelism DB2 can utilize something called parallel operations for access to your tables and/or indexes. with this in mind the following two statements are identical: SELECT DISTINCT WORKDEPT. EXCEPT. you might consider if the duplicates are produced by only one subselect of the query. make sure you properly document so other programmers understand why you are grouping on all columns. then perhaps a GROUP BY can be used on all columns in those particular subselects since GROUP BY has a greater potential to avoid a sort. This can have a dramatic performance impact on performance for queries that process large quantities of data across multiple table and index partitions. So. There are two types of query parallelism. then change each to a UNION ALL. However. or INTERSECT ALL. EXCEPT ALL. If duplicates are possible.

You can enable query parallelism by utilizing the DEGREE(ANY) parameter of a BIND command. DB2 can also utilize something called Sysplex query parallelism. that you utilize query parallelism for larger queries. This can also significantly reduce the elapsed time for large queries. or by setting the value of the CURRENT DEGREE special register to the value of ‘ANY’. and then run the smaller queries in parallel on multiple processors. Query CP parallelism can break a large query up into smaller queries. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 23 . however.SQL AND ACCESS PATHS I/O parallelism manages concurrent I/O requests for a single query. For small queries this can actually be a performance detriment. I/O bound queries. as there is overhead to the start of the parallel tasks. It can fetch data using these multiple concurrent I/O requests into the buffers. Be sure. In this situation DB2 can split a query across multiple members of a data sharing group to utilize all of the processors on the members for extreme query processing of very large queries. and significantly reduce the response time for large.

Page intentionally left blank .

These classifications dictate how DB2 processes the predicates. we should only do absolutely what is necessary to get the job done. Finally. Stage 3 predicates are the least efficient.CHAPTER 3 Predicates and SQL Tuning This chapter will address the fundamentals of SQL performance and SQL processing in DB2. and so stage 1 predicates are generally more efficient than stage 2 predicates. and how much data is filtered when during the process. a stage 3 predicate is a predicate that is processed in the application layer. Data from stage 1 is passed to stage 2 for further processing. We also need to understand how to reduce repeat processing. and thus cannot limit the range of data retrieved from disk. That is. Stage 2 predicates cannot utilize an index. That is. That is. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 25 . the DB2 stage 1 engine understands your indexes and tables. Only a stage 1 predicate can limit the range of data accessed on a disk. These classifications are: • Stage 1 Indexable • Stage 1 • Stage 2 • Stage 3 You can apply a very simple philosophy in order to understand which of your predicates might fall within one of these classifications. and how it accesses your data. Types of DB2 Predicates Predicates in SQL statements fall under classifications. The stage 2 engine processes functions and expressions. That being “filter as much as possible as early as possible”. Keep in mind that the most efficient SQL statement is the one that never executes. Once we are processing inside DB2 we need to do our filtering as close to the indexes and data as possible. but is not able to directly access data in indexes and tables. and can utilize an index for efficient access to your data. we need a basic understanding of how DB2 optimizes SQL statements. In order to write efficient SQL statements. we should maintain a simple and important philosophy. We have to keep in mind that the most expensive thing we can do is to travel from our application to DB2. filtering performed once the data is retrieved from DB2 and processed in the application. When writing and tuning SQL statements. and to tune SQL statements.

Stage 1 indexable predicates are predicates that can be used to match on the columns of an index. or stage 2. then the predicate in the following query is a stage 1 indexable predicate: SELECT LASTNAME. >=. The most simple example of a stage 1 predicate would be of the form <col op value>. <.com . If the predicate. Assuming that an index exists on the EMPNO column of the EMP table.PREDICATES AND SQL TUNING There is a table in the DB2 Administration Guide (DB2 V7 and DB2 V8) or the DB2 Performance Monitoring and Tuning Guide (DB2 9) that lists the various predicate forms. You should try to utilize stage 1 matching predicates whenever possible. Stage 1 Indexable The best performance for filtering will occur when stage 1 indexable predicates are used. IN (for a list of values). FIRSTNME FROM WHERE EMP EMPNO = ‘000010’ 26 ca. provides the best filtering in that it can actually limit the range of data accessed from the index. and LIKE (without a leading search character) can also be stage 1 indexable. The first thing that determines if a predicate is stage 1 is the syntax of the predicate and then secondly. even though it would classify as a stage 1 predicate. <=) and value represents a noncolumn expression (an expression that does not contain a column from the table). Only examples are given in this guide. where col is a column of a table. The stage 1 indexable predicate. But just because a predicate is listed in the chart as stage 1 indexable does not mean it will be used for either an index or processed in stage 1 as there are many other factors that must also be in place. Predicates containing BETWEEN. stage 1. when an index is utilized. op is an operator (=. it is the type and length of constants used in the predicate. and whether they are stage 1 indexable. >. All indexable predicates are stage 1 but not all stage 1 predicates are indexable. is evaluated after a join operation then it is a stage 2 predicate. This is one area which has improved dramatically with version 8 in that more data types can be promoted inside stage 1 thus improving stage 1 matching.

Although non-indexable stage 1 predicates cannot limit the range of data read from an index. although DB2 does a good job of promoting mismatched data types to stage 1 via casting (as of version 8) some predicates with mismatched data types are stage 2. Stage 2 predicates cannot take advantage of indexes to limit the data access. Predicates containing NOT BETWEEN. among others. and are generally more expensive than stage 1 predicates because they are evaluated later in the processing. FIRSTNME FROM WHERE EMP EMPNO <> ‘000010’ Stage 2 The DB2 manuals give a complete description of when a predicate can be stage 2 versus stage 1. where col is a column of a table. or value. For example. the best reference is the chart in the IBM manuals) are generally of the form <col NOT op value>. generally speaking stage 2 happens after data accesses and performs such things as sorting and evaluation of functions and expressions. The following is an example of a non-indexable stage 1 predicate: SELECT LASTNAME. NOT IN (for a list of values). Some stage 1 predicates are not available for index access. A predicate can also appear to be stage 1. op is an operator. and CASE expressions. any predicate processed after a join operation is stage 2. they are available as index screening predicates. and value represents a non-column expression. but processed as stage 2. The following are examples of stage 2 predicates (EMPNO is a character column of fixed length 6): CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 27 . or LIKE (with a leading search character) can also be stage 1 indexable. Also. However. host variable. Stage 2 predicates are generally those that contain column expressions.PREDICATES AND SQL TUNING Other Stage 1 Just because a predicate is stage 1 does not mean that it can utilize an index. One example is a range predicate comparing a character column to a character value that exceeds the length of the column. NOT LIKE (without a leading search character). These predicates (again. correlated subqueries.

After reading the EMP table. FIRSTNME FROM WHERE EMP EMPNO > ‘00000010’ SELECT LASTNAME. That is.com . the following statement is executed: IF (BONUS + COMM) < CONTINUE ELSE (SALARY*.PREDICATES AND SQL TUNING SELECT LASTNAME. Imagine for a moment that a COBOL program has read the EMP table.1.10) THEN PERFORM PROCESS-ROW END-IF. FIRSTNME FROM WHERE EMP SUBSTR(LASTNAME. Performance is best served when all filtering is done in the data server. filtering that is done in the application program. and not in the application code. You should have no stage 3 predicates.1) = ‘B’ Stage 3 A stage 3 predicate is a fictitious predicate we’ve made up to describe filtering that occurs after the data has been retrieved from DB2. 28 ca.

Why did this happen? Because the programmer was told that no stage 2 predicates are allowed in SQL statements as part of a shop standard. but the second (connected again by an OR) is stage 2.1 Combining Predicates It should be known that when simple predicates are connected together by an OR condition. that the following stage 2 predicate would always outperform a stage 3 predicate: SELECT EMPNO FROM WHERE EMP BONUS + COMM >= SALARY * 0. the entire compound predicate is stage 2: SELECT FROM WHERE EMPNO EMP WORKDEPT = ‘C01’ OR SUBSTR(LASTNAME. and the second is non-indexable stage 1. The following example contains two simple predicates combined by an OR. The data retrieved from the table isn’t used unless the condition is false. The truth is.1) = ‘L’ CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 29 . however. The result is that the entire compound predicate is stage 1 and not indexable: SELECT FROM WHERE EMPNO EMP WORKDEPT = ‘C01’ OR SEX <> ‘M’ In the next example the first simple predicate is stage 1 indexable.PREDICATES AND SQL TUNING This is an example of a stage 3 predicate. Thus. the resulting compound predicate will be evaluated at the higher stage of the two simple predicates. The first predicate is stage 1 indexable.1.

then the entire WHERE clause is false. DB2 can take advantage of this in that it could utilize an index (if available) on either the LASTNAME column or the MIDINIT column. Non-Boolean term predicates can at best be considered for multi-index access. that when evaluated false for the row. For example. We can modify predicates in the WHERE clause to take advantage of this to improve index access.PREDICATES AND SQL TUNING Boolean Term Predicates Simply put. makes the entire WHERE clause evaluate to false. While the following query contains non-Boolean term predicates: WHERE LASTNAME > ‘HAAS’ OR (LASTNAME = ‘HAAS’ AND MIDINIT > ‘T’) 30 ca. The same is true for the predicate on the MIDINIT column.com . the following predicate contains Boolean term predicates: WHERE LASTNAME = ‘HAAS’ AND MIDINIT = ‘T’ If the predicate on the LASTNAME column evaluates as false. This is opposed to the following WHERE clause: WHERE LASTNAME = ‘HAAS’ OR MIDINIT = ‘T’ In this case if the predicate on LASTNAME or MIDINIT evaluates as false. This is important because only Boolean term predicates can be considered for single index access. These predicates are non-Boolean term. then the rest of the WHERE clause must be evaluated. a Boolean term predicate is a simple or compound predicate.

and so they will be redundant and wasteful. then a=c. This gives DB2 more choices for filtering and table access sequence. an IN. DB2 can take advantage of this to introduce redundant predicates. predicates that contain a LIKE. You can add your own redundant predicates. and thus could take better advantage of an index on the LASTNAME column: WHERE LASTNAME >= ‘HAAS’ AND (LASTNAME > ‘HAAS’ OR (LASTNAME = ‘HAAS’ AND MIDINIT > ‘T’)) Predicate Transitive Closure DB2 has the ability to add redundant predicates if it determines that the predicate can be applied via transitive closure. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 31 .A = T2.B = 1. contains a Boolean term predicate.PREDICATES AND SQL TUNING A redundant predicate can be added that.A = 1 DB2 will generate a redundant predicate on T2.B T1. and so it may benefit you to code those predicates redundantly if you feel they can provide a benefit. So are BETWEEN predicates. Predicates of the form <COL op value>. If a=b and b=c. or subqueries are not available for transitive closure. in a join between two tables such as this: SELECT * FROM ON WHERE T1 INNER JOIN T2 T1. Remember the rule of transitive closure. where the operation is an equals or range predicate. However. are available for predicate transitive closure. while making the WHERE clause functionally equivalent. however DB2 will not consider them when it applies predicate transitive closure. For example.

The number one performance issue in recent times has to be the quantity of SQL issued. then leave it in DB2. this is generally due to an inadequacy somewhere else in the design. However. there are only trade-offs to performance. it does. These types of designs run contrary to performance. The price you pay. and returned cross-memory to the program’s work area. however. Remember. However. Can these mean possibly coding multiple SQL statements against the same table for different sets of columns? Yes. passed to stage 2.PREDICATES AND SQL TUNING Coding Efficient SQL There’s a saying that states “the most efficient SQL statement is the SQL statement that is never executed”. The second is that if we need to go to the data server we should do so in the most efficient manner. You need to look at your SQL statements and ask some questions. and more and more applications are distributed across the network and using DB2 as the database. • Retrieve only the columns needed by the program Retrieving extra columns results in the column being moved from the page in the buffer pool to a user work area. The first is that we need to minimize the number of times we go to the data server.com . as DB2 should be handling all the filtering of the rows. Funny stuff. This is increasingly the case as DB2 evolves into the “ultimate” server. but speed development and create flexibility and adaptability to business requirements. and can the codes be cached within the program? Coding SQL for Performance There are some basic guidelines for coding SQL statements for the best performance: • Retrieve the fewest number of rows The only rows of data returned to the application should be those actually needed for the process and nothing else. that is the effort required for improved performance and reduction of CPU. You should code your SQL statements to return only the columns required. or some other implementation problem. Recently. program or in DB2. There will always be that rare situation when all the data needs to be brought to the application. We should never find a data filtering statement in a program. That is an unnecessary expenditure of CPU. is performance. 32 ca. filtered. transformed and processed. and is the order important? Are you repeatedly accessing code tables. and are duplicates possible? Does the statement have an ORDER BY. The main reason behind is typically a generic development or an object-oriented design. normally a shortcoming in the physical design. missing indexes. However. We need to do two things when we place SQL statements into our applications. Is the statement needed? Do I really need to run the statement again? Does the statement have a DISTINCT. there is truth in those sayings. someone was overheard to say that the most efficient SQL statement is the one that is never written. Whenever there is a process that needs to be performed on the data. and there is a decision as to where the process is done. there is no right or wrong answer. Avoid Unnecessary Processing The very first thing you need to ask yourself is if a SQL statement is necessary.

• Only sort truly necessary columns When DB2 sorts data. SQL that is used to retrieve current or historical data. Careful evaluation of program processes can reveal situations in which SQL statements are issued that don’t need to be. • Use stage 1 predicates You should be aware of the stage 1 indexable. and join processing. The worse possible scenario is where a sort is performed as the result of the SQL in a cursor. Generally. For example. and modules that read all data into a copy area only to have the caller use a few fields are all examples we’ve seen. stage 1. DISTINCT. it generally is a clear indication that an index is missing. It is more an application of common sense. If separate application programs are retrieving data from related tables. see if it is possible to produce the same result with an outer join. If the SQL has to sort 100 rows and only 10 are processed by the application. See this section in this chapter on promoting predicates to help determine if you can convert your stage 2 predicates to stage 1. In one simple test of a programmatic join of two tables in a COBOL program compared to the equivalent SQL join. then this can result in extra SQL statements issued. easier to convert between inner and outer join. and also pick the best table access sequence. Generic SQL and generic I/O layers do not belong in high performing systems. Make sure that if you specify the sort only on the necessary columns. the columns used to determine the sort order actually appear twice in the sort rows. This is especially true for programmatic joins. Generic SQL generally only has logic that will retrieve some specific data or data relationship. EXCEPT. and if the join won’t introduce any duplicate rows in the result. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 33 . and the application only processes a subset of the data. and leaves the entire business rule processing in the application.PREDICATES AND SQL TUNING • Reduce the number of SQL statements Each SQL statement is a programmatic call to the DB2 subsystem that incurs fixed overhead for each call. then it may be better to sort in the application. These things are not possible with subqueries. Common modules. If ordering is necessary after a join process. and harder to forget to code a join predicate. In this case the sort overhead needs to be removed. Sorting can be caused by GROUP BY. if ordering is always necessary then there should probably be indexes to support the ordering. and try to use stage 1 predicates for filtering whenever possible. • Never use generic SQL statements Generic SQL equals generic performance. the SQL join consumed 30% less CPU. • Use joins instead of subqueries Joins can outperform subqueries for existence checking if there are good matching indexes for both tables involved. or perhaps an index needs to be created to help avoid the sort. any column specified in an equals predicate in the query does not need to be in the ORDER BY clause. then hopefully the result set is small. ORDER BY. If ordering is required for a join process. and stage 2 predicates. • Avoid UNIONs (not necessarily UNION ALL) Are duplicates possible? If not then replace the UNION with a UNION ALL. INTERSECT. Code SQL joins instead of programmatic joins. DB2 can take advantage of predicate transitive closure. • Use the ON clause for all join predicates By using explicit join syntax instead of implicit join syntax you can make a statement easier to read. • Avoid unnecessary sorts Avoid unnecessary sorting is a requirement in any application but more so in any high performance environment especially with a database involved. or stage 1 non-indexable to indexable. Otherwise. or in a situation in which the subselects of the UNION are all against the same table try using a CASE expression and only one pass through the data instead.

all SQL queries should be written to evaluate the most restrictive predicates first to filter unnecessary rows earlier.PREDICATES AND SQL TUNING • Code the most selective predicates first DB2 processes the predicates in a query in a specific order: – Indexable predicates are applied first in the order of the columns of the index – Then other stage 1 predicates are applied – Then stage 2 predicates are applied Within each of the stages above DB2 will process the predicates in this order: – All equals predicates (including single IN list and BETWEEN with only one value) – All range predicates and predicates of the form IS NOT NULL – Then all other predicate types Within each grouping DB2 will process the predicates in the order they have been coded in the SQL statement. See the advanced SQL and performance section of this chapter for more details. then you should be asking yourself if you can possibly promote those predicates to a more efficient stage. If you’ve coded a stage 1 non-indexable predicate. This includes subqueries as well (within the grouping of correlated and noncorrelated). Promoting Predicates We need to code SQL predicates as efficiently as possible. If you have a stage 1 non-indexable predicate. For general existence checking you could code a singleton SELECT statement that contains a FETCH FIRST 1 ROW ONLY clause. Correlated references will encourage nested loop join and index access for transactions. or a stage 2 predicate.com . Therefore. • Use the proper method for existence checking For existence checking using a subquery an EXISTS predicate generally will outperform an IN predicate. can it be promoted to stage 1 indexable? Take for example the following predicate: WHERE END_DATE_COL <> ‘9999-12-31’ 34 ca. • Avoid unnecessary materialization Running a transaction that processes little or no data? You might want to use correlated references to avoid materializing large intermediate result sets. reducing processing cost at a later stage. and so when you are writing predicates you should make sure to use stage 1 indexable predicates whenever possible.

It should be noted that as of DB2 9 it is possible to create an index on an expression. This is a column expression.PREDICATES AND SQL TUNING If we use the “end of the DB2 world” as an indicator of data that is still active. By moving the arithmetic to the right side of the inequality we can make the predicate stage 1 indexable: WHERE BIRTHDATE < CURRENT DATE – 30 YEARS These examples can be applied as general rules for promoting predicates. or even stage 1 indexable? Take for example this predicate: WHERE BIRTHDATE + 30 YEARS < CURRENT DATE The predicate above is applying date arithmetic to a column. An index on an expression can be considered for improved performance of column expressions when it’s not possible to eliminate the column expression in the query. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 35 . then why not change the predicate to something that is indexable: WHERE END_DATE_COL < ‘9999-12-31’ Do you have a stage 2 predicate that can be promoted to stage 1. which makes the predicate a stage 2 predicate.

PREDICATES AND SQL TUNING

Functions, Expressions and Performance DB2 has lots of scalar functions, as well as other expressions that allow you to manipulate data. This contributes to the fact that the SQL language is indeed a programming language as much as a data access language. It should be noted, however, that the processing of functions and expressions in DB2, if not for filtering of data, but instead for pure data manipulation, are generally more expensive than the equivalent application program process. Keep in mind that even the simple functions such as concatenation:

SELECT FIRSTNME CONCAT LASTNAME FROM EMP

will be executed for every row processed. If you need the ultimate in ease of coding, time to delivery, portability, and flexibility, then code expressions and functions like this in your SQL statements. If you need the ultimate in performance, then do the manipulation of data in your application program. CASE expressions can be very expensive, but CASE expressions will utilize “early out” logic when processing. Take the following CASE expression as an example:

CASE WHEN C1 = ‘A’ OR C1 = ‘K’ OR C1 = ‘T’ OR C1 = ‘Z’ THEN 1 ELSE NULL END

36

ca.com

PREDICATES AND SQL TUNING

If most of the time the value of the C1 column is a blank, then the following functionally equivalent CASE expression will consume significantly less CPU:

CASE WHEN C1 <> ‘ ‘ AND (C1 = ‘A’ OR C1 = ‘K’ OR C1 = ‘T’ OR C1 = ‘Z’) THEN 1 ELSE NULL END

DB2 will take the early out from the Case expression for the first not true of an AND, or the first TRUE of an OR. So, in addition to testing for the blank value above, all the other values should be tested with the most frequently occurring values first. Advanced SQL and Performance Advanced SQL queries can be a performance advantage or performance disadvantage. Because the possibilities with advanced SQL are endless it would be impossible to document all of the various situations and the performance impacts. Therefore, this section will give some examples and tips for improving performance of some complex queries. Please keep in mind, however, that complex SQL will always be a performance improvement over an application program process when the complex SQL is filtering or aggregating data.
CORRELATION In general, correlation encourages nested loop join and index access. This can be very good for your transaction queries that process very little data. However, it can be bad for your report queries that process vast quantities of data. The following query is generally a very good performer when processing large quantities of data:

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 37

PREDICATES AND SQL TUNING

SELECT

TAB1.EMPNO, TAB1.SALARY, TAB2.AVGSAL,TAB2.HDCOUNT

FROM (SELECT FROM WHERE EMPNO, SALARY, WORKDEPT EMP JOB='SALESREP') AS TAB1

LEFT OUTER JOIN (SELECT AVG(SALARY) AS AVGSAL, COUNT(*) AS HDCOUNT, WORKDEPT FROM GROUP EMP BY WORKDEPT) AS TAB2

ON TAB1.WORKDEPT = TAB2.WORKDEPT

If there is one sales rep per department, or if most of the employees are sales reps, then the above query would be the most efficient way to retrieve the data. In the query above, the entire employee table will be read and materialized in the nested table expression called TAB2. It is very likely that the merge scan join method will be used to join the materialized TAB2 to the first table expression called TAB1.

38

ca.com

PREDICATES AND SQL TUNING

Suppose now that the employee table is extremely large, but that there are very few sales reps, or perhaps all the sales reps are in one or a few departments. Then the above query may not be the most efficient due to the fact that the entire employee table still has to be read in TAB2, but most of the results of that nested table expression won’t be returned in the query. In this case, the following query may be more efficient:

SELECT

TAB1.EMPNO, TAB1.SALARY, TAB2.AVGSAL,TAB2.HDCOUNT

FROM

EMP TAB1 AVG(SALARY) AS AVGSAL, COUNT(*) AS HDCOUNT FROM WHERE EMP WORKDEPT = TAB1.WORKDEPT) AS TAB2 TAB1.JOB = 'SALESREP'

,TABLE(SELECT

WHERE

While this statement is functionally equivalent to the previous statement, it operates in a very different way. In this query the employee table referenced as TAB1 will be read first, and then the nested table expression will be executed repeatedly in a nested loop join for each row that qualifies. An index on the WORKDEPT column is a must.
MERGE VERSUS MATERIALIZATION FOR VIEWS AND NESTED TABLE EXPRESSIONS When you have a reference to a nested table expression or view in your SQL statement DB2 will possibly merge that nested table expression or view with the referencing statement. If DB2 cannot merge then it will materialize the view or nested table expression into a work file, and then apply the referencing statement to that intermediate result. IBM states that merge is more efficient than materialization. In general, that statement is correct. However, materialization may be more efficient if your complex queries have the following combined conditions:

• Nested table expressions or view references, especially multiple levels of nesting • Columns generated in the nested expressions or views via application of functions, user-defined functions, or other expressions • References to the generated columns in the outer referencing statement

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 39

PREDICATES AND SQL TUNING

In general, DB2 will materialize when some sort of aggregate processing is required inside the view or nested table expression. So, typically this means that the view or nested table expression contains aggregate functions, grouping (GROUP BY), or DISTINCT. If materialization is not required then the merge process happens. Take for example the following query:

SELECT MAX(CNT) FROM (SELECT ACCT_RTE, COUNT(*) AS CNT FROM YLA.ACCT_TABLE) AS TAB1

DB2 will materialize TAB1 in the above example. Now, take a look at the next query:

SELECT SUM(CASE WHEN COL1=1 THEN 1 END) AS ACCT_CURR ,SUM(CASE WHEN COL1>1 THEN 1 END) AS ACCT_LATE FROM (SELECT CASE ACCT_RTE WHEN ‘AA’ THEN 1 WHEN ‘BB’ THEN 2 WHEN ‘CC’ THEN ‘2’ WHEN ‘DD’ THEN 2 WHEN ‘EE’ THEN 3 END AS COL1 FROM YLA.ACCT_TABLE) AS TAB1

40

ca.com

PREDICATES AND SQL TUNING

In this query there is no DISTINCT, GROUP BY, or aggregate functions, and so DB2 will merge in inner table expression with the outer referencing statement. Since there are two references to COL1 in the outer referencing statement, then the CASE expression in the nested table expression will be calculated twice during query execution. The merged statement would look something like this:

SELECT SUM(CASE WHEN CASE ACCT_RTE WHEN ‘AA’ THEN 1 WHEN ‘BB’ THEN 2 WHEN ‘CC’ THEN ‘2’ WHEN ‘DD’ THEN 2 WHEN ‘EE’ THEN 3 END =1 THEN 1 END) AS ACCT_CURR ,SUM(CASE WHEN CASE ACCT_RTE WHEN ‘AA’ THEN 1 WHEN ‘BB’ THEN 2 WHEN ‘CC’ THEN ‘2’ WHEN ‘DD’ THEN 2 WHEN ‘EE’ THEN 3 END >1 THEN 1 END) AS ACCT_LATE FROM YLA.ACCT_TABLE

For this particular query the merge is probably more efficient than materialization. However, if you have multiple levels of nesting and many references to generated columns, merge can be less efficient than materialization. In these specific cases you may want to introduce a nondeterministic function into the view or nested table expression to force materialization. We use the RAND() function.
UNION IN A VIEW OR NESTED TABLE EXPRESSION

You can place a UNION or UNION ALL into a view or nested table expression. This allows for some really complex SQL processing, but also enables you to create logically partitioned tables. That is, you can store data in multiple tables and then reference them all together as one table in a view. This is useful for quickly rolling through yearly tables, or to create optional table scenarios with little maintenance overhead.

CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 41

AMOUNT FROM WHERE HIST1 ACCOUNT_ID BETWEEN 1 AND 100000000 UNION ALL SELECT ACCOUNT_ID. prune (eliminate) query blocks at either statement compile time or during statement execution. and SQL statements written against our view are distributed to each query block.PREDICATES AND SQL TUNING While each SQL statement in a union in view (or table expression) results in an individual query block. depending upon the query. DB2 does employ a technique to prune query blocks for efficiency. AMOUNT) AS SELECT ACCOUNT_ID. If we consider the account history view: CREATE VIEW V_ACCOUNT_HISTORY (ACCOUNT_ID.com . DB2 can. AMOUNT FROM WHERE HIST1 ACCOUNT_ID BETWEEN 100000001 AND 200000000 then consider the following query: SELECT * FROM WHERE V_ACCOUNT_HISTORY ACCOUNT_ID = 12000000 42 ca.

PREDICATES AND SQL TUNING The predicate of this query contains the literal value 12000000. In any situations in which the distributed predicate renders a particular query block unnecessary. Any unnecessary query blocks are pruned. and replace the literal with a host variable: SELECT * FROM WHERE V_ACCOUNT_HISTORY ACCOUNT_ID = :H1 CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 43 . DB2 will compare the distributed predicate against the predicates coded in the UNION inside our view. . WHERE AND ACCOUNT_ID BETWEEN 100000001 AND 200000000 ACCOUNT_ID = 12000000 ACCOUNT_ID BETWEEN 1 AND 100000000 ACCOUNT_ID = 12000000 DB2 will prune the query blocks generated at statement compile time base upon the literal value supplied in the predicate. . So. However. DB2 eliminates that query block. looking for redundancies. or at run time if a host variable or parameter marker is supplied for a redundant predicate. So. Only one underlying table will then be accessed. and this predicate is distributed to both of the query blocks generated. when our distributed predicates look like this: . DB2 compares the literal predicate supplied in the query against the view with the predicates in the view. since one of the resulting combined predicates is impossible. . one of them will be pruned when the statement is compiled. So. So. WHERE AND . Query block pruning can happen at statement compile (bind) time. although in our example above two query blocks would be generated. let’s take the previous query example. . DB2 will prune (eliminate) that query block from the access path.

This is a complicated process that does not always work. if the value 12000000 was supplied for the host variable value. two query blocks would be generated. Please be aware of the fact that this is an extremely specific recommendation. V_ACCOUNT_HISTORY HIST2 HIST2. So. but are not comprehensive. You should test it by stopping one of the tables. we have recommended using programmatic joins in situations in which the query can benefit from runtime query block pruning using the joining column.com .UPD_TSP = (SELECT MAX(UPD_TSP) FROM WHERE AND WITH UR. or by repeating host variables in predicates instead of. and certainly not a general recommendation. However. and distributes the predicate amongst both generated query blocks. You should also be aware of the fact that there are limits to UNION in view (or table expression). at run time DB2 will examine the supplied host variable value. and there are APARs out there that address the problems. In certain situations with many query blocks (UNIONS). and then running a query with a host variable that should prune the query block on that table. many rows of data. This is done by reducing the number of tables in the UNION.HIST_EFF_DTE = ‘2005-08-01’) 44 ca.PREDICATES AND SQL TUNING If this statement was embedded in a program. and only one underlying table would be accessed. So. and dynamically prune the query blocks appropriately. This is because DB2 does not know the value of the host variable in advance. and many index levels for the inner view or table expression of a join.ACCT_ID = :acctID HIST2. and bound into a plan or package. Take a look at the query below. We can get query block pruning on literals and host variables. If the statement is successful then runtime query block pruning is working for you. we can’t get query block pruning on joined columns.ACCT_ID = :acctID AND HIST. However. testing is important. You can influence proper use of runtime query block pruning by encouraging distribution of joins and predicates into the UNION in view (see section below). or in addition to using correlation. and that you should always test to see if you get bind time or run time query block pruning. then one of the two query blocks would be pruned at run time. In some cases it just doesn’t happen. SELECT {history data columns} FROM V_ACCOUNT_HISTORY HIST WHERE HIST.HIST_EFF_DTE = ‘2005-08-01’ AND HIST.

Static profiling enables you to capture dynamic SQL statements on the client. Correlated predicates would not have gotten the pruning. Volume 1. The same goes for the predicate on the HIST_EFF_DTE predicate in the subquery. In addition. Recommendations for Distributed Dynamic SQL DB2 for z/OS is indeed the ultimate data server. This allows accounting data to be grouped by the id. This only works if multiple statements and program logic. representing a transaction. instructing the interface software to use the static statement whenever the equivalent dynamic statement is issued from the program. Here are some general recommendations for improving the performance of distributed dynamic SQL: • Turn on the dynamic statement cache. you are going to loose some of the efficiencies that go along with the RELEASE(DEALLOCATE) bind parameter. in particular sequential detection and index lookaside. Making sure this SQL is always performing the best can sometimes be a challenge. • Consolidate calls into stored procedures. • Use the EXPLAIN STMTCACHE ALL command to expose all the statements in the dynamic statement cache. and that skewed distribution is properly reflected via frequency distribution or histogram statistics in the system catalog. and then bind the statements on the server into a package. either via JDBC. It also makes for a more optimal use of the dynamic statement cache in that statements are reused by authorization id. Be aware of the fact that if you are moving a local batch process to a remote server. This is documented in the DB2 Call Level Guide and Reference. • Parameter markers are almost always better than literal values. you can EXPLAIN individual statements in the dynamic statement cache once they have been identified. Literal values are only helpful when there is a skewed distribution of data. and there is a proliferation of dynamic distributed SQL hitting the system. • Use a fixed application level authorization id. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 45 . and applications can be identified by their id. • Use connection pooling. Once the statements are bound the initialization file can be modified. ODBC. • Use the DB2 CLI/ODBC/JDBC Static Profiling feature. or DB2 CLI. You need an exact statement match to reuse the dynamic statement cache. are consolidated together in a single stored procedure. The output from this command goes into a table that contains all sorts of runtime performance information. This CLI feature can be used to turn dynamic statements into static statements. but it isn’t. The reason for repeating the host variable and value references was to take advantage of the runtime pruning.PREDICATES AND SQL TUNING The predicate on ACCT_ID in the subquery could be correlated to the outer query.

Proper Statistics What can we say? DB2 utilizes a cost-based optimizer. Collecting the proper statistics is a must to good performance. These counts can give you a good indication as to whether or not you need frequency distribution statistics or histogram statistics. and runtime reoptimization for skewed data distributions. With each new version of DB2 the optimizer takes more advantage of catalog statistics. and that optimizer needs accurate statistical information about your data. If you suspect columns are correlated. as long as it is given the proper information. This also means that with each new version DB2 is more dependent upon catalog statistics. To determine if columns are correlated you can run these two queries (DB2 V8. If you have skewed data. then perhaps you need frequency distribution and/or histogram statistics. you can gather column correlation statistics. If you are using parameter markers and host variables then in the least you need cardinality statistics.PREDICATES AND SQL TUNING Influencing the Optimizer The DB2 optimizer is very good at picking the correct access path.com . You can also run GROUP BY queries against tables for columns used in predicates to count the occurrences of values in these columns. and DB2 9): SELECT COUNT (DISTINCT CITY) AS CITYCNT * COUNT (DISTINCT STATE) AS STATECNT FROM CUSTOMER SELECT COUNT (*) FROM (SELECT DISTINCT CITY. 46 ca. In situations in which it does not pick the optimal access path it typically has not been given enough information. STATE FROM CUSTOMER) AS FINLCNT If the number from the second query is lower than the number from the first query then the columns are correlated. You should have statistics on every column referenced in every WHERE clause in your shop. This is sometimes the case when we are using parameter markers or host variables in a statement. or are using literal values in your SQL statements.

For dynamic SQL statements there are three options. REOPT(AUTO) is DB2 9 only. and will then reoptimize the query based upon those values if it determines that the values have changed significantly There is also a system parameter called REOPTEXT (DB2 9) that enables the REOPT(AUTO) like behavior. DB2 can employ something called runtime reoptimization to help your queries. ALWAYS. In situations where you use REOPT(ALWAYS) consider separating the query that can benefit into it’s own package. • REOPT(ALWAYS) This will reoptimize a dynamic statement with parameter markers based upon the values provided on every execution. and can take advantage of frequency distribution or histogram statistics if available. and better access path decisions by the optimizer. However. This bind option will instruct DB2 to recalculate access paths at runtime using the host variable parameters. • REOPT(ONCE) Will reoptimize a dynamic statement the first time it is executed base upon the values provided for parameter markers. For static SQL the option of REOPT(ALWAYS) is available. if there are many queries in the package then they will all get reoptimized. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 47 . • REOPT(AUTO) Will track how the values for the parameter markers change on every execution. for any dynamic SQL queries (without NONE. what if DB2 doesn’t know anything about your input value: SELECT * FROM EMP WHERE MIDINIT > :H1 In this case if the values for MIDINT are highly skewed. This could negatively impact statement execution time for these queries. However. and needs to be prepared again. This reoptimization option should be used with care as the first execution should have good representative values. subsystem wide. This can result in a much improved filter factor. then DB2 could make an inaccurate estimate of the filter factor for some input values. This can result in improved execution time for large queries. or ONCE already specified) that contain parameter markers when it detects changes in the values that could influence the access path. REOPT(ONCE). REOPT(ALWAYS). and REOPT(AUTO). The access path will then be reused until the statement is removed from the dynamic statement cache.PREDICATES AND SQL TUNING Runtime Reoptimization If your query contains a predicate with an embedded literal value then DB2 knows something about the input to the query.

where n is the number of rows you intent to fetch. • Convert an inner join to a left join. and multi-index access. By increasing potential matchcols on this table DB2 may select an index for more efficient access and change the table access sequence. as well as the fact that the right table will filter no data. then the outer query is executed. Encouraging Index Access and Table Join Sequence There are certain techniques you can employ to encourage DB2 to choose a certain index or a table access sequence in a query that accesses multiple tables: When DB2 joins tables together in an inner join it attempts to select the table that will qualify the fewest rows first in the join sequence. When you code subqueries you tell DB2 the table access sequence. The OPTIMIZE FOR clause is a way. A value of 1 is the strongest influence on these factors. DB2 V8) or the DB2 Performance Monitoring and Tuning Guide (DB2 9). We do not recommend using predicate disablers. • Convert joins to subqueries.PREDICATES AND SQL TUNING OPTIMIZE FOR Clause One thing that the optimizer certainly does not know about your query is how much data you are actually going to fetch from the query. If. then you should use the OPTIMIZE FOR n ROWS clause. You should actually put the number of rows you intend to fetch in the OPTIMIZE FOR clause. Predicate disablers are documented in the DB2 administration Guide (DB2 V7. within a SQL statement. Non-correlated subqueries are accessed first. to tell DB2 how many rows you intent to process. By coding a left join you have absolutely dictated the table join sequence to DB2. for some reason. if you are not going to read all the rows of the result set. and then any correlated subqueries are executed. • Disable predicates on the table you don’t want accessed first. and a join method of nested loop join. This could change the join type as well as the join sequence. Incorrectly representing the number of rows you intent to fetch can result in a more poorly performing query. 48 ca. However.com . • Force materialization of the table you want accessed first by placing it into a nested table expression with a DISTINCT or GROUP BY. this is only effective if the table moved from a join to a subquery doesn’t have to return data. The use of this clause will discourage such things as list prefetch and sequential prefetch. Of course. This technique is especially useful when a nested loop join is randomly accessing the inner table. • Convert a joined table to a correlated nested table expression. DB2 has chosen the incorrect table first (maybe due to statistics or host variables) then you can attempt to change the table access sequence by employing one or more of these techniques: • Enable predicates on the table you want to be first. This will force another table to be accessed first as the data for the correlated reference is required prior to the table in the correlated nested table expression being accessed. DB2 can then make access path decisions in order to determine the most efficient way to access the data for that quantity. DB2 is going to use the system catalog statistics to get an estimate of the number of rows that will be returned if the entire query is processed by the application. It will encourage index usage to avoid a sort.

• Try using the OPTIMIZE FOR clause. • Code an ORDER BY clause on the columns of the index of the table that you want to be accessed first in the join sequence. • Increase the index matchcols either by modifying the query or the index. This may influence DB2 to use that index to avoid the sort. and has less encouragement to choose that table first. • Change the order of the tables in the FROM clause. then DB2 will not generate the predicate redundantly against the other table. We don’t recommend them. The IBM manuals document predicate disablers. If DB2 has chosen one index over another. You can also try converting from implicit join syntax to explicit join syntax and vice versa.PREDICATES AND SQL TUNING • Add a CAST function to the join predicate for the table you want accessed first. and access that table first. By placing this function on that column you will encourage DB2 to access that table first in order to avoid a stage 2 predicate against the second table. • Add columns to the index to make the access index-only. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 49 . • You could disable predicates that are matching other indexes. for example a non-correlated subquery against the SYSDUMMY1 table that returns a single value rather than an equals predicate on a host variable or literal value. and you disagree then you try one of these techniques to influence index selection: • Code an ORDER BY clause on the leading columns of the index you want chosen. Since the subquery is not eligible for transitive closure. • Try coding a predicate on the table you want accessed first as a non-transitive closure predicate. This may encourage DB2 to chose that index to avoid a sort.

Page intentionally left blank .

and it is not necessary to run a REORG utility. there are still several benefits to having a segmented table space for one table. but how each of these is measured will depend on the business and the nature of the data. you should be using a segmented table space over a simple table space. Mass deletes are also much faster for segmented table spaces. In simple table spaces.CHAPTER 4 Table and Index Design for Performance It is very important to make the correct design choices when designing physical objects such as tables. does not mean that the physical model is the most optimal for performance. however. ease of maintenance. segment pages are available for immediate reuse. General Table Space Performance Recommendations Here are some general recommendations for table space performance. Also. or to use one of the many relational database modeling tools supplied by vendors. If a mass delete or a DROP table occurs. and if one table page is locked. If a table scan is performed. Since the pages in a segment will only contain rows from one table. and overall performance. Whatever version you are on. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 51 . There are several advantages to using segmented table spaces. There is nothing wrong with twisting the physical design to improve performance as long as the logical model is not compromised or destroyed. empty pages will not be scanned. it is generally difficult and time-consuming to make changes to the underlying structure. The best way to perform logical database modeling is to use strong guidelines developed by an expert in relational data modeling. it can inadvertently lock a row of another table just because it is on the same page. There are guidelines and recommendations for achieving these design goals. When inserting records. When you only have one table per table space. the segments belonging to the table being scanned are the only ones accessed. the COPY utility will not have to copy empty pages left by a mass delete. and indexes — once a physical structure has been defined and implemented. as well as for business requirements. and they produce less logging (as long as the table has not been defined with DATA CAPTURE CHANGES). DB2 objects need to be designed for availability. table spaces. Use Segmented or Universal Table Spaces Support for simple table spaces is going away. rows are intermixed on pages. Although you can create them in DB2 V7 and DB2 V8. But it is important to remember that just because you can ‘press a button’ to have a tool migrate your logical model into a physical model. some read operations can be avoided by using the more comprehensive space map of the segmented table space. this is not an issue. they cannot be created in DB2 9 (although you can still use previously created simple table spaces). there will be no locking interference with other tables.

you can add. without the need for key ranges. and rotate partitions. segmented tables benefit from increased table space limits and SQL and utility parallelism that were formerly available only to partitioned tables. The value that you specify in the MAXPARTITIONS clause is used to protect against run-away applications that perform an insert in an infinite loop. In addition. partitioned tables required key ranges to determine the target partition for row placement. in DB2 9.TABLE AND INDEX DESIGN FOR PERFORMANCE DB2 9 introduces a new type of table space called a universal table space. and operations that are supported on a regular partitioned or segmented table space are supported on a range-partitioned table space. as for partitioned table spaces. which enables segmented tables to be partitioned as they grow. You can implement partition-by-growth table space organization in several ways: • You can use the new MAXPARTITIONS clause on the CREATE TABLESPACE statement to specify the maximum number of partitions that the partition-by-growth table space can accommodate. Now. As a result. This ALTER TABLESPACE operation acts as an immediate ALTER. 52 ca. you can also control the partition size. A universal table space is a table space that is both segmented and partitioned. You can create a range-partitioned table space by specifying both SEGSIZE and NUMPARTS keywords on the CREATE TABLESPACE statement. With a range-partitioned table space. is controlled by the DSSIZE and page size. Because the rangepartitioned table space is also a segmented table space. • Improved mass delete performance: Mass delete in a segmented table space organization tends to be faster than table spaces that are organized differently. A range-partitioned table space is a type of universal table space that is based on partitioning ranges and that contains a single table. Two types of universal table spaces are available: the partition-by-growth table space and the range-partitioned table space. and take advantage of partition-level operations and parallelism capabilities. Partitioned tables provide more granular locking and parallel operations by spreading the data over more data sets. Range-partitioned universal table spaces follow the same partitioning rules as for partitioned table spaces in general. you can run table scans at the segment level.com . rebalance. you have the option to partition according to data growth. That is. A universal table space offers the following benefits: • Better space management relative to varying-length rows: A segmented space map page provides more information about free space than a regular partitioned space map page. The new range-partitioned table space does not replace the existing partitioned table space. choose from a wide array of indexing options. Before DB2 9. The maximum number of partitions possible for both range-partitioned and partition-by-growth universal table spaces. • You can use the MAXPARTITIONS clause on the ALTER TABLESPACE statement to alter the maximum number of partitions to which an existing partition-by-growth table space can grow. and you can avoid needing to reorganize a table space to change the limit keys. As a result. you can immediately reuse all or most of the segments of a table. you can immediately reuse all or most of the segments of a table after the table has been dropped or a mass delete has been performed.

total table size is limited depending on the DSSIZE specified). Some tradeoffs include an increase in the number of pages. you can spread partitions among several members to split workloads. There is a way to dismiss clustering for inserts. If the oldest index is dropped and recreated. The basic underlying rule to clustering is that if your application is going to have a certain sequential access pattern or a regular batch process you should cluster the data according to that input sequence. grouping data by time. and that balance will depend on the processing requirements of each table space or index space. This also allows you to place frequently accessed partitions on faster devices. office location). fewer row overflows. Range and sequential retrieval are the primary requirement. Free Space The FREEPAGE and PCTFREE clauses are used to help improve the performance of updates and inserts by allowing free space to exist on table spaces. When the free space is used up. separating old and new data. or grouping data by some meaningful business entity (e. DB2 will use the free space defined. but partitioning is another requirement and can be the more critical requirement. See the section in this chapter on append processing. the records must be located elsewhere. If you do not specify an explicit clustering index DB2 will cluster by the index that is the oldest by definition (often referred to as the first index created). DB2 allows us to define up 4096 partitions of up to 64 GB each (however. Read-only tables do not CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 53 .g. faster inserts. This also gives you the ability to access data in certain partitions while utilities are executing on others. There are several advantages to partitioning a table space. Performance improvements include improved access to the data through better clustering of data. that index will now be a new index and clustering will now be by the next oldest index. fewer rows per I/O and less efficient use of buffer pools. but partitioning also has advantages for tables that are not necessarily large. and more pages to scan. For large tables. partitioning is the only way to store large amounts of data. Clustering and partitioning can be completely independent. and by doing this it can keep records in clustering sequence as much as possible. Then within those partitions you can cluster the data by your most common sequential access sequence. especially as tables get extremely large in size. and a reduction in the number of REORGs required. You should chose a partitioning strategy based upon a concept of application controlled parallelism. When inserts and updates are performed. it is important to achieve a good balance for each individual table space and index space when deciding on free space. It generally is not the primary key but a sequential range retrieval key and should be chosen by the most frequent range access to the table data.TABLE AND INDEX DESIGN FOR PERFORMANCE Clustering and Partitioning The clustering index is NOT always the primary key. and we’re given a log of options for organizing our data in a single dimension (clustering and partitioning are based on the same key) dual dimensions (clustering inside each partition by a different key) or multiple dimensions (combining different tables with different partitioning unioned inside a view). You can also spread your data over multiple volumes and need not use the same storage group for each data set belonging to the table space. and this is when performance can begin to suffer. sales region. In a data-sharing environment. You can take advantage of the ability to execute utilities on separate partitions in parallel. As a result. Non-partitioned table spaces are limited to 64 GB of data.

Which version of DB2 you are using will impact how you. You can notice this activity by gradually increasing insert CPU times in you application (by examining the accounting records) as well as increasing getpage counts and relocated row counts. since the primary allocation happens when the data set is created. or how DB2. and tables with a pure insert-at-end strategy (append processing) generally don’t require free space. Exceptions to this would be tables with VARCHAR columns and tables using compression that are subject to updates. DB2 will request a data set allocation corresponding to the primary space allocation. organizes your columns. especially for the secondary space quantity. The SECQTY specifies the minimum secondary space allocation for a DB2-managed data set of the table space or partition. If this space is not found DB2 will exhaustively search the table space for a free place to put the row before extending a segment or a data set.com . In addition. Allocations The PRIQTY and SECQTY clauses of the CREATE TABLESPACE and ALTER TABLESPACE SQL statements specify the space that is to be allocated for the table space if the table space is managed by DB2. and the operating system will attempt to allocate the initial extent for the data set in one contiguous piece. However. the MGEXTSZ subsystem parameter will influence the SECQTY allocations. When this happens it’s time for a REORG. The secondary space allocation will take immediate effect.TABLE AND INDEX DESIGN FOR PERFORMANCE require any free space. and the maximum that can be specified is 64 GB. You can specify the primary and secondary space allocations for table spaces and indexes or allow DB2 to choose them. When DB2 attempts to maintain cluster during inserting and updating it will search nearby for free space and/or free pages for the row. and a perhaps a reevaluation of your free space quantities. Column Ordering There are two reasons you want to order your columns in specific ways. the actual primary and secondary data set sizes depend upon a variety of settings and installation parameters. These settings influence the allocation by the operating system of the underlying VSAM data sets in which table space and index space data is stored. However. increases the possibility of reaching the maximum data set size before running out of extents. and when set to YES (NO is the default) changes the space calculation formulas to help utilize all of the potential space allowed in the table space before running out of extents. and to minimize the amount of logging performed when updating rows. 54 ca. The primary space allocation is in kilobytes. then that allocation will not take affect until a data set is added (depends upon the type of table space) or until the data set is recreated via utility execution (such as a REORG or LOAD REPLACE). The PRIQTY specifies the minimum primary space allocation for a DB2-managed data set of the table space or partition. You can alter the primary and secondary space allocations for a table space. to reduce CPU consumption when reading and writing columns with variable length data. Having DB2 choose the values. DB2 will request secondary extents in a size according to the secondary allocation.

and the table space is using the reordered row format. To reduce the logging in these situations you can still order the columns such that the most frequently updated columns are last. you may want to organize your data differently (read on). depending on the SQL workload and the amount of compression: • Higher buffer pool hit ratios • Fewer I/Os • Fewer getpage operations • Reduced CPU time for image copies CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 55 . In addition to new table spaces any table spaces that are REORGed or LOAD REPLACEd will get the reordered row format. In this case you have no control over the placement of never changing variable length rows in front of always changing fixed length rows. In many cases. in DB2 V7 or DB2 V8 you want to put the variable length columns after the fixed length columns when defining your table.TABLE AND INDEX DESIGN FOR PERFORMANCE For reduced CPU when using variable length columns you’ll want to put your variable length columns after all of your fixed length columns (DB2 V7 and DB2 V8). Within each grouping (fixed and variable) your DDL column order is respected. This can possibly mean increased logging for your heavy updaters. you’ll want you variable length columns that never change in front of the fixed length columns that do change (DB2 V7 and DB2 V8) in order to reduce logging. Once you move to new function mode in DB2 9 any new tablespace you create will automatically have its variable length columns placed after the fixed length columns physically in the table space. If you mix the variable length columns and your fixed length columns together then DB2 will have to search for any fixed or variable length column after the first variable length column. then followed by the columns that changed all the time (e. and this will increase CPU consumption. This is especially true for any read-only applications. For reduced logging you’ll want to order the rows in your DDL a little differently. followed by the columns that change less frequently. an update timestamp).g. For applications in which the rows are updated. So. Things change with DB2 9 as it employs something called reordered row format. For high update tables you’ll want the columns that never changed placed first in the row. and first byte changed to end of the row for variable length rows if the length changes (unless the table has been defined with DATA CAPTURE CHANGES which will cause the entire before and after image to be logged for updates). So. using the COMPRESS clause can significantly reduce the amount of DASD space needed to store data. This all changes once you’ve moved to DB2 9. Compression allows us to get more rows on a page and therefore see many of the following performance benefits. but the compression ratio achieved depends on the characteristics of the data. This is because DB2 will record the first byte changed to last byte changed for fixed length rows. Utilizing Table Space Compression Using the COMPRESS clause of the CREATE TABLESPACE and ALTER TABLESPACE SQL statements allows for the compression of data in a table space or in a partition of a partitioned table space. regardless of the column ordering in the DDL. You can also contact IBM about turning off the automatic reordered row format if this is a concern for you. and DB2 will respect the order of the columns within the grouping.

In addition. the relative overhead of compression is higher for table space scans and less costly for index access. Of course. and requires that all values of a given attribute or table column also exist in some other table column. If the constraint is violated. This means there is a potential for row relocation causing high numbers in NEARINDREF and FARINDREF. The database manager maintains these relationships. Remember that referential integrity checking has cost associated with it and can become expensive if used for something like continuous code checking. RI is meant for parent/child relationship. This means you are now doing more I/O to get to your data because it has been relocated and you will have to REORG to get it back to its original position. Once a table-check constraint has been defined for a table. encrypted data. and a SQL error will be returned. you are going to need indexes to support the relationships enforced by DB2. Data that does not compress well includes binary data.com . have the relationships enforced in a central location in the database is much more powerful than making it dependent upon application logic. Utilizing Constraints Referential integrity (RI) allows you to define required relationships between and within tables. In general DB2 enforced referential integrity is much more efficient than coding the equivalent logic in your application program. • The data access path DB2 uses affects the processor cost for data compression.TABLE AND INDEX DESIGN FOR PERFORMANCE There are also some considerations for processing cost when using compression.SYSTABLEPART to be sure you are getting a savings (at least 50% is average). In general. not code checking. Keep in mind when you compress the row is treated as varying length with length change when it comes to updates. and repeating strings. Better options for this include check constraints. every UPDATE and INSERT statement will involve checking the restriction or constraint. Table check constraints will enforce data integrity at the table level. 56 ca. Also you should never compress small tables/rows if you are worried about concurrency issues as this will put more rows on a page. or even better to put codes in memory and check them there. which are expressed as referential constraints. • The processor cost to decode a row using the COMPRESS clause is significantly less than the cost to encode that same row. the data record will not be inserted or updated. but that cost is relatively low. Some data will not compress well so you should query the PAGESAVE column in SYSIBM.

These indexes would need to support: • Insert strategy • Primary key • Foreign key (if a child table) • SQL access path • Clustering You can head on this path by respecting some design objectives: • Avoid surrogate keys. If you can achieve this goal then you’ve captured the holy grail of efficient database design. The table-check constraints can help implement specific rules for the data values contained in the table by specifying the values allowed in one or more columns in every row of a table. The Holy Grail of Efficient Database Design When designing your database you should set out with a single major goal in mind for your tables that will contain a significant amount of data. you are adding another random read to these statements. not be used for data edits in support of data entry. Then in that case maybe lots of indexes can be a detriment. and try not to change them during design • Never entertain a “just in case” type of design mentality CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 57 . and perhaps even updates. since the validation of each data value can be performed by the database and not by each of the applications accessing the database. check constraints should. Use meaningful business keys instead • Let the child table inherit the parent key as part of the child’s primary key • Cluster all tables in a common sequence • Determine the common access paths.TABLE AND INDEX DESIGN FOR PERFORMANCE A table check constraint can be defined at table creation time or later. in general. That goal is one index per table. Just remember this simple rule. Is your application a heavy reader. When does it matter. Indexing Depending upon you application and the type of access. Can you application afford that in support of queries that may use the index? That’s for you to decide. respect them. if you are adding a secondary index to a table then for inserts and deletes. and deleting from your table. and lots of DML activity. It’s best to cache code values locally within the application and performs the edits local to the application. This can save time for the application developer. using the ALTER TABLE statement. This will avoid numerous trips to the database to enforce the constraints. indexing can be a huge performance advantage or a performance bust. However. updating. What if your application is constantly inserting. or perhaps even a readonly application? Then lots of indexes can be a real performance benefit. well of course it depends.

When DB2 splits a page it’s going to look for a free page in which to place one of the split pages. 58 ca. A bufferpool that is used to create the index must be 8K. Here are some general index design guidelines. Then you should monitor the frequency of page splits to determine when to REORG the index. or establish a regular REORG policy for that index. Index compression is recommended for applications that do sequential insert operations with few of no delete operations. Keep in mind. index compression can possibly save you read time for sequential operations. So. or 32K in size. If you don’t have enough free space you could get an increased frequency of index page splits.TABLE AND INDEX DESIGN FOR PERFORMANCE General Index Design Recommendations Once you’ve determined your indexes you need to design them properly for performance. If it does not find a page nearby it will exhaustively search the index for a free page. and subsequently increase the amount of I/O for random reads. Index Free Space Setting the PCTFREE and FREEPAGE for your indexes depends upon how much insert and delete activity is going to occur against those indexes.com . The reason that the bufferpool size is larger than the page size is that index compression only saves space on disk. Index Compression As of DB2 9 an index can be defined with the COMPRES YES option (COMPRESS NO is default). Random inserts and deletes can adversely effect compression. Index compression can be used where there is a desire to reduce the amount of disk space an index consumes. For indexes with heavy changes you should consider larger amounts of free space. 16K. For indexes that have little of no inserts and deletes (updates that change key columns are actually inserts and deletes) then you can probably use a small PCTFREE with no free pages. Keep in mind that adding free space may increase the number of index levels. The physical page size for the index on disk will be 4K. This could lead to CPU and locking problems for very large indexes. The best thing to do is to set a predictive PCTFREE that anticipates growth over a period of time such that you will never split a page. Index compression can have a significant impact on the REORGs and index rebuilds resulting in significant savings in this area. The data in the index page is expanded when read into the pool. and perhaps random (but far less likely). however. An Index compression is also recommended for applications where the indexes are created primarily for scan operations. that if you use the copy utility to back up an index that image copy is actually uncompressed.

This means that the at least leading column of the partitioning key has to be supplied in the query in order for DB2 to prune (eliminate) partitions from the query access path. The partitioning scheme of the DPSI will be the same as the table space partitions and the index keys in ‘x’ index partition will match those in ‘x’ partition of the table space. NPSIs can be unique or nonunique. DATA PARTITIONED SECONDARY INDEXES The DPSI index type provides us with many advantages for secondary indexes on a partitioned table space over the traditional NPSIs (Non-Partitioning Secondary Indexes) in terms of availability and performance. you can achieve more parallelism on processes. NON-PARTITIONING SECONDARY INDEXES NPSIs are indexes that are used on partitioned tables. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 59 . NPSIs can be broken apart into multiple pieces (data sets) by using the PIECESIZE clause on the CREATE INDEX statement. Other limitations to using DPSIs include the fact that they cannot be unique (some exceptions in DB2 9) and they may not be the best candidates for ORDER BYs. such as heavy INSERT batch jobs. by alleviating the bottlenecks caused by contention on a single data set. Some of the benefits that this provides include: • Clustering by a secondary index • Ability to easily rotate partitions • Efficient utility processing on secondary indexes (no BUILD-2 phase) • Allow for reducing overhead in data sharing (affinity routing) DRAWBACKS OF DPSIS While there will be gains in furthering partition independence. you can have several NPSIs on a table if necessary. While you can have only one clustered partitioning index. non-partitioning secondary indexes and data partitioned secondary indexes. however. some queries may not perform as well. If you have several pieces.TABLE AND INDEX DESIGN FOR PERFORMANCE Secondary Indexes There are two types of secondary indexes. but rather they are for access to the data. grow extremely large and become a maintenance and availability issue. As of DB2 V8 and beyond the NPSI can be the clustering index. Pieces can vary in size from 254 KB to 64 GB — the best size will depend on how much data you have and how many pieces you want to manage. If the query has predicates that reference columns in a single partition are therefore are restricted to a single partition of the DPSI it will benefit from this new organization. They can. which is used to order and partition the data. NPSIs are great for fast read access as there is a single index b-tree structure. However if the predicate references only columns in the DPSI it may not perform very well because it may need to probe several partitions of the index. The queries will have to be designed to allow for partition pruning through the predicates in order to accomplish this. They are not the same as the clustered partitioning key.

Volatile Tables As of DB2 V8. You’ll need to have large data sets to hold the backup. and rewrite the query to use the MQT instead. and can get very large in size. MQTs do not completely eliminate optimization problems but rather move optimizations issues to other areas. Materialized Query Tables Decision support queries are often difficult and expensive. This helps avoid redundant scanning. and the ability to determine when DB2 will actually use the MQT for a query. with via a REFRESH TABLE command. It is. MQTs provide the means to save the results of prior queries and then reuse the common query results in subsequent queries. however. Most of these types of problems are addressed by OLAP tools. however. This allows you to precompute whole or parts of each query and then use computed results to answer future queries. you can use the COPY and RECOVER utilities to backup and recover these indexes. maintaining the MQTs when underlying tables are updated. With these types of queries traditional optimization and performance is not always optimal. You should carefully consider whether your strategy for an index should be backup and recover. MQTs are useful for data warehouse type applications. They are good for tables that shrink and grow allowing matching index scans on tables that have grown larger without new RUNSTATS. as with a table space.com . As of DB2 V8. your responsibility to move data into the MQT. ability to recognize usefulness of MQT for a query. This means. or by manually moving the data yourself. 60 ca. one solution can be with the use of MQTs — Materialized Query Tables. that large NPSI’s cannot be copied in pieces. Special Types of Tables Used for Performance Here are some table designs that are part of the DB2 product offering and are intended to help you boost the performance of the applications. They typically operate over a large amount of data which may have to scan or process terabytes of data and possibly perform multiple joins and complex aggregations. but MQTs are the first step. REBUILD will require large quantities of temporary DASD to support sorts. This may be especially useful for very large indexes. Some challenges include finding the best MQT for expected workload. or perhaps even hitting the 59 volume limit for a data set on DASD.TABLE AND INDEX DESIGN FOR PERFORMANCE Rebuild or Recover? As of DB2 V8 you can define an index as COPY YES. or rebuild. volatile tables are a way to prefer index access over table space scans or nonmatching index scans for tables that have statistics that make them appear to be small. Be aware. as well as more CPU than a RECOVER. aggregating and joins. The main advantage of the MQT is that DB2 is able to recognize a summary query against the source table(s) for the MQT. This could mean large quantities of tapes.

The keyword VOLATILE can be specified on the CREATE TABLE or the ALTER TABLE statements. To best support cluster tables (volatile tables) DB2 will use index only access when possible. To exchange table and index data between the base table and clone table issue an EXCHANGE statement with the DATA BETWEEN TABLE table-name1 AND table-name2 syntax. After an exchange.* and the clones are named *I0001. Clone Tables In DB2 9 you can create a clone table on an existing base table at the current server by using the ALTER TABLE statement. You can exchange the base and clone data by using the EXCHANGE statement.I0002. If specified you are basically forcing an access path of index accessing and no list prefetch. Within each group rows need to be accessed in same sequence to avoid lock contention during concurrent access.* data set name. the authorization granted as part of the clone creation process is the same as you would get during regular CREATE TABLE processing. For example.*.*. The sequence of access is determined by primary key and if DB2 changes the access path lock contention can occur. ALTER TABLE base-table-name ADD CLONE clone-table-name The creation or drop of a clone table does not impact applications accessing base table data. We need to be sure indexes are available for single table access and joins. or the dynamic statement cache. and this has the effect of changing the data that appears in the base and clone tables and their indexes. issue an ALTER TABLE statement with the ADD CLONE option. The table is cloned and the clone’s data set is initially named *.TABLE AND INDEX DESIGN FOR PERFORMANCE They also improve support for cluster tables. This will minimize application contention on cluster tables by preserving the access sequence by primary key. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 61 . This is in essence a method of performing an online load replace! After a data exchange. the instance numbers that represent the base and the clone objects change. which immediately changes the data contained in the base and clone tables and indexes. No data movement actually takes place. The schema (creator) for the clone table will be the same as for the base table. a base table exists with the data set name *I0001. To create a clone table. The instance numbers in the underlying VSAM data sets for the objects (tables and indexes) do change. Cluster tables are those tables that have groups or clusters of data that logically belong together. You can create a clone table only if the base table is in a universal table space. the base and clone table names remain the same as they were prior to the data exchange. No base object quiesce is necessary and this process does not invalidate plans.*. Each time that an exchange happens. the base table will have an *I0002. the base objects are named *. Although ALTER TABLE syntax is used to create a clone table.I0002. This could be confusing. packages. You should also be aware of the fact that when the clone is dropped and an uneven number of EXCHANGE statements have been executed.

These NPSIs are not partitioned. Here are a few alternatives to traditional table design. our larger database tables have been placed into partitioned tablespaces. Partitioning helps with database management because it’s easier to manage several small objects versus one very large object. a partitioning index is required (DB2 V7 only).e. each partition is limited to a maximum size of 64GB. These high demands are pushing database administrators to the edge. we have to make sure that they are built in a way that makes them available 24 hours a day. and as impairment to database management as it is more difficult to manage such large database objects. A UNION in a view can be utilized as an alternative to table partitioning in support of very large database tables. For example. and exist as single large database indexes. a utility operation against a single partition may potentially make the entire NPSI unavailable). for example. and available all of the time. UNION in View for Large Table Design The amount of data being stored in DB2 is increasing in a dramatic fashion. the following view definition: 62 ca. Traditionally. several database tables can be created to hold different subsets of the data that would have otherwise be held in a single table. and if efficient alternate access paths to the data are desired then non-partitioning indexes (NPSIs) are required. hold significant quantities of data. Availability is also an increasingly important feature of our databases. So. can be used to determine which data goes into which of the various tables.com . Key values. In this type of design. similar to what may be used in partitioning. Thus NPSIs can present themselves as an obstacle to availability (i. They have to build large tables that are easy to access. There are still some limits to partitioning.TABLE AND INDEX DESIGN FOR PERFORMANCE Some Special Table Designs In situations where you are storing large amounts of data or need to maximize transaction performance there are many creative things you can do. 7 days per week. are easy to manage. Take. as we build these giant tables in our system.

PAYMENT_AMOUNT. PAYMENT_DATE. PAYMENT_TYPE. INVOICE_NUMBER FROM WHERE ACCOUNT_HISTORY2 ACCOUNT_ID BETWEEN 100000001 AND 200000000 UNION ALL SELECT ACCOUNT_ID. INVOICE_NUMBER FROM WHERE ACCOUNT_HISTORY4 ACCOUNT_ID BETWEEN 300000001 AND 999999999. PAYMENT_TYPE. INVOICE_NUMBER FROM WHERE ACCOUNT_HISTORY1 ACCOUNT_ID BETWEEN 1 AND 100000000 UNION ALL SELECT ACCOUNT_ID. PAYMENT_AMOUNT. PAYMENT_TYPE. PAYMENT_DATE. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 63 . PAYMENT_TYPE. PAYMENT_DATE. INVOICE_NUMBER FROM WHERE ACCOUNT_HISTORY3 ACCOUNT_ID BETWEEN 200000001 AND 300000000 UNION ALL SELECT ACCOUNT_ID. PAYMENT_TYPE. PAYMENT_AMOUNT. PAYMENT_AMOUNT. INVOICE_NUMBER) AS SELECT ACCOUNT_ID. PAYMENT_AMOUNT. PAYMENT_DATE.TABLE AND INDEX DESIGN FOR PERFORMANCE CREATE VIEW V_ACCOUNT_HISTORY (ACCOUNT_ID. PAYMENT_DATE.

updates. or could be a segmented or partitioned tablespace. This is especially true for joining because DB2 will distribute the join to each query block. and then compare the predicates. • Utility operations could execute against an individual underlying table. you could in some cases exceed the 225 table limit in which case DB2 will materialize the view. Also. in some cases the pruning does not always work for host variables so you need to test. • DB2 can apply query block pruning for literal values and host variables. If you are using DB2 9. So. however. you’ll want to keep the number of tables UNIONed in the view to 15 or under. 64 ca. but not joined columns. or just a partition of that underlying table. This truly gives us full partition independence.com . creating still smaller physical database objects. This multiplies the number of tables in the query. and this can increase both bind time and execution time. Also. • In general. This means you’d have to utilize special program logic and possibly even dynamic SQL to perform inserts. and deletes against the base tables. • Each underlying table could be as large as 16TB. • The view can be referenced in any SELECT statement in exactly the same way as a physical table would be. • When using UNION in a view you’ll want to keep the number of tables in a query to a reasonable number. This is an excellent performance feature. Any impossible predicates will result in the query block being pruned (not executed).TABLE AND INDEX DESIGN FOR PERFORMANCE By separating the data into different table. This greatly shrinks utility times against these individual pieces. • Predicate transitive closure happens after the join distribution. For that reason if you expect query block pruning on a joined column then you should code a programmatic join (this is the only case). and improves concurrency. There are some limitations to UNION in a view that should be considered: • The view is read-only. • NPSIs on each of the underlying tables could be much smaller and easier to manage then they would be under a single table design. if you are joining from a table to a UNION in a view and predicate transitive closure is possible from the table to the view then you have to code the redundant predicate yourself. logically setting the size limit of the table represented by the view at 64TB. and creating the view over the tables we can create a logical account history table with these distinct advantages over a single physical table: • We can add or remove tables with very small outages. usually just the time it takes to drop and recreate the view. • Each underlying table could be clustered differently. • DB2 will distribute predicates against the view to every query block within the view. you can possible utilize INSTEAD OF triggers to provide this functionality. • We can partition each of the underlying tables.

In DB2 9 append processing is turned on via the APPEND YES option of the CREATE TABLE statement. and our date lookup table called ACCT_HIST_DATES. ACCT_ID. DB2 will not respect cluster during inserts and simply put all new data at the end of the table space. if you are partitioning. Using user-supplied dates as starting and ending points the look up table can be used to fill the gap with the dates in between. When append processing is turned on. Also.TABLE AND INDEX DESIGN FOR PERFORMANCE Append Processing for High Volume Inserts In situations in which you have very high insert rates you may want to utilize append processing. This look up table will contain nothing more than your ascending key values. So. If you have a single index for read access. then we can use our new sparse index to access the data we need. in this example we’ll access an account history table (ACCT_HIST) that has a key on HIST_EFF_DTE. such as copies and REORGs. then having append processing may mean more random reads. This may require more frequent REORGs to keep the data organized for read access. When append processing is turned on. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 65 . This gives us the initial path to the history data. Append processing can also be used to store historical or seldom read audit information. One example would be dates. and the partitioning key is not the read index then you will still have random reads during insert to you non-partitioned index. which contains one column and one row for each legitimate date value corresponding to the HIST_EFF_DTE column of the ACCT_HIST table. If the historical data is organized and partitioned by the date. Read access is performed by constructing a key during SELECT processing. and we have only one date per month (to further sub-categorize the data). DB2 will use an alternate insert algorithm whose intention it is to simply add data to the end of a partition or a table space. Make sure to have APARS PQ86037 and PQ87381 applied. In these cases you would want to partition based upon an every ascending value (e. Append processing can be turned of for DB2 V7 or DB2 V8 by setting the table space options PCTFREE 0 FREEPAGE 0 MEMBER cluster. In this situation all table space maintenance. a date) and have all new data go to the end of the last partition. You will possibly need a secondary index. All other data will be static and will not require maintenance. will be against the last partition. Building an Index on the Fly In situations in which you are storing data via a key designed for a high speed insert strategy with minimal table maintenance you’d also like to try to avoid secondary indexes. Scanning billions of rows typically is not an option. The solution may be to build a look-up table that acts as a sparse index. You’ll need to make sure you have adequate free space to avoid index page splits.g. or each read query will have to be for a range of values within the ascending key domain. say one date per month for every month possible in our database.

we can retrieve the account history data directly from the account history table.EFF_DTE WHERE HIST. Here we need to use our sparse index history date table to build the key on the fly.com .TABLE AND INDEX DESIGN FOR PERFORMANCE CURRENT DATA ACCESS Current data access is easy.HIST_EFF_DTE = DTE. 66 ca. We apply the range of dates to the date range table. SELECT {columns} FROM ACCT_HIST HIST INNER JOIN ACCT_HIST_DATES DTE ON HIST.ACCT_ID = :ACCT-ID AND HIST. RANGE OF DATA ACCESS Accessing a range of data is a little more complicated than simply getting the most recent account history data. and then join that to the history table.HIST_EFF_DTE BETWEEN :BEGIN-DTE AND :END-DTE. SELECT {columns} FROM WHERE AND ACCT_HIST ACCT_ID = :ACCT-ID HIST_EFF_DTE = :HIST-EFF-DTE.

DB2 can utilize a during join predicate to avoid probing the child table when there is no data for the parent key. SELECT {columns} FROM ACCT_HIST HIST INNER JOIN ACCT_HIST_DATES DTE ON HIST. the performance of reading multiple tables compared to the equivalent single record read just isn’t good enough. Denormalization “Light” In many situations. This will require some additional application responsibility in maintaining that indicator column. and so the following query would join the two tables together to list the basic account information (balance) along with the history information if present: CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 67 . However. This type of denormalization can be applied to parent and child tables. and that is for all the efficiency. Instead of denormalizing the optional child table data into the parent table simply instead add a column to the parent table that basically indicates whether of not the child table has any data for that parent key. In these situations many people will begin denormalizing the tables. The account may or may not have account history. however.ACCT_ID = :ACCT-ID. especially those in which there is a conversion from a legacy flat file based system to a relational database. In some situations.EFF_DTE WHERE HIST. when the child table data is in an optional relationship to the parent. This is an act of desperation! Remember the reason your moving your data into DB2 in the first place.HIST_EFF_DTE = DTE. and faster time to delivery for your new applications. flexibility. Well. instead of denormalization you could possible employ a “denormalization light” instead. These are situations in which the application is expecting to read all of the data that was once represented by a single record. an account table and an account history table. but is now in many DB2 tables. Take. for example. there is a performance concern (or more importantly a performance problem in that an SLA is not being met) for reading the multiple DB2 tables. and you may as well have stayed with your old flat file based design.TABLE AND INDEX DESIGN FOR PERFORMANCE FULL DATA ACCESS To access all of the data for an account we simply need a version of the previous query without the date range predicate. By denormalizing you are throwing these advantages away. portability.

B.TABLE AND INDEX DESIGN FOR PERFORMANCE SELECT A.DTE. B.ACCT_ID A.DTE DESC In the example above the query will always probe the account history table in support of the join.ACCT_ID ORDER BY B.HIST_IND = ‘Y’ ORDER BY B. In this particular case the access to the account history table is completely avoided when the indicator column a value not equal to “Y”: SELECT A. Then we can use a during join predicate.ACCT_ID = B. DB2 will only perform the join operation with the join condition is true. We can employ our light denormalization by adding an indicator column to the account table. and supply nulls for the account history table when data is not present as indicated.AMOUNT FROM ACCOUNT A LEFT OUTER JOIN ACCT_HIST B ON A.AMOUNT FROM ACCOUNT A LEFT OUTER JOIN ACCT_HIST B ON AND A. You can image now the benefit of this type of design when you are doing a legacy migration from a single record system to 40 or so relational tables with lots of optional relationships. while maintaining the relation design for efficient future applications. B. whether or not the account history table has data. B. 68 ca.DTE.CURR_BAL.ACCT_ID = B.CURR_BAL.com . This form of denormalizing can really improve performance in support of legacy system access.DTE DESC DB2 is going to test that indicator column first before performing the join operation.

This includes. then you’re going to have to predict the performance. which are documented in the DB2 Performance Monitoring and Tuning Guide (DB2 9). CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 69 . predicate matching. and dynamic statements that are cached (DB2 V8. the creator (schema) of the EXPLAIN tables is determined by the CURRENT SQLID of the person running the EXPLAIN statement. or the OWNER of the plan or package at bind time. and several tools can be used. filter factor. This enables application developers and DBAs to see what access path DB2 is going to take for a query. There are several ways in which we can predict the performance of our applications prior to implementation. and the DDL can be found in the DB2 sample library member DSNTESC: • PLAN_TABLE The PLAN_TABLE contains basic access path information for each query block of your statement. The following tables can be defined manually. Some of these tables are not available in DB2 V7. and if any attempts at query tuning are needed. The target set of EXPLAIN tables depends upon the authorization id associated with the process. The EXPLAIN tables are optional.CHAPTER 5 EXPLAIN and Predictive Analysis It is important to know something about how your application will perform prior to the application actually executing in a production environment. which access method is utilized. DB2 9). EXPLAIN can be invoked in one of the following ways: • Executing the EXPLAIN SQL statement for a single statement • Specifying the BIND option EXPLAIN(YES) for a plan or package bind • Executing the EXPLAIN via the Optimization Service Center or via Visual EXPLAIN When EXPLAIN is executed it can populate many EXPLAIN tables. The PLAN_TABLE forms the basis for access path determination. as well as detailed information about predicate stages. and whether or not a sort will be performed. The important thing you have to ask when you begin building your application is “Is the performance important?” If not. and the number of matching index columns. DB2 9). So. which join method is utilized. EXPLAIN Facility The DB2 EXPLAIN facility is used to expose query access path information. and DB2 will only populate the tables that it finds under the SQLID or OWNER of the process invoking the EXPLAIN. There are many EXPLAIN tables. This chapter will suggest ways to do that. among other things. What if you can’t wait until implementation to determine performance? Well. and then fix the performance once the application has been implemented. then proceed with development at a rapid pace. DB2 can gather basic access path information in a special table called the PLAN_TABLE (DB2 V7. DB2 V8. information about index usage.

These additional EXPLAIN tables are typically populated by the optimization tools that use them. this table will contain the estimated processor cost in milliseconds.com .EXPLAIN AND PREDICTIVE ANALYSIS • DSN_STATEMNT_TABLE The statement table contains estimated cost information for the cost of a statement. If it is by index access the specific type of index access is indicated. lock and latch requests. at least some of them you can make and use yourself without having to use the various optimization tools. • DSN_FUNCTION_TABLE The function table contains information about user-defined functions that are a part of the SQL statement. Information from this table can be compared to the cost information (if populated) in the DB2 System Catalog table. Issuing the statement will result in DB2 reading the contents of the dynamic statement cache. The statement table can be used to compare estimated costs when you are attempting to modify statements for performance. EXPLAIN Tables Utilized by Optimization Tools There are many more EXPLAIN tables than those listed here. I/O operations. however. • DSN_STATEMENT_CACHE_TABLE This table is not populated by a normal invocation of EXPLAIN. number of sorts. however some of them are utilized in Version 8. or whether or not an additional sort step is required. for the user-defined functions. This table is available only for DB2 V8 and DB2 9. – Category B DB2 had to use default values to make some cost calculations. but instead by the EXPLAIN STMTCACHE ALL statement. number of index scans. It places the cost values into two categories: – Category A DB2 had enough information to make a cost estimation without using any defaults. Here is some of the critical information it provides: • METHOD This column indicates the join method. For a given statement. the number of rows processed by the statement. then it will be populated with the cost information that corresponds to the access path information for the query that is stored in the PLAN_TABLE. and putting runtime execution information into this table. • MATCHCOLS If an index is used for access this column indicates the number of columns matched against the index. the statement text. 70 ca. and much more! This is extremely valuable information about the dynamic queries executing in a subsystem. transaction patterns.SYSROUTINES. These additional tables are documented in the DB2 Performance Monitoring and Tuning Guide (DB2 9). that this is a cost estimate. etc). and is not truly reflective of how your statement will be used in an application (given input values. However. What EXPLAIN Does and Does Not Tell You The PLAN_TABLE is the key table for determining the access path for a query. or via index access. SYSIBM. Keep in mind. If the statement table is present when you EXPLAIN a query. You should always test your statements for performance in addition to using the statement table and EXPLAIN. • ACCESSTYPE This column indicates whether or not the access is via a table space scan. as well as the estimated processor cost in service units. This includes information about the frequency of execution of these statements.

and more. By manually running EXPLAIN. and any sorting that may be happening as part of your query. join operations. although some of the predicates are. Please see Chapter 3 of this guide for more information on predicate evaluation sequence. Therefore. • Predicate evaluation sequence The EXPLAIN tables do not show you the order in which predicates of the query are actually evaluated. • The input values If you are using host variables in your programs then EXPLAIN knows nothing about the potential input values to those host variables. filter factor. what the most common values are. it’s important that you understand your clustering indexes. This understanding is important for INSERT performance. it is your responsibility to make sure that proper indexes in support of the RI constraints are established and in use. or via one of the tools. and whether or not DB2 will be using APPEND processing for your inserts. • The SQL statement The SQL statement is not captured in the EXPLAIN tables. and if there is data skew relative to the input values. detailed cost information. Here are some of the things you cannot get from EXPLAIN: • INSERT indexes EXPLAIN does not tell you the index that DB2 will use for an INSERT statement. Therefore. indexes that are used. These tables provide detailed information about such things as predicate stages. or a join operation. If you have additional EXPLAIN tables created (those created by Visual Explain among other tools) then those tables are populated automatically either by using those tools. or by manually running EXPLAIN. You can also query those tables manually. However. SORTC#### These columns indicate any sorts that may happen in support of a UNION. then you are going to need to see the program source code. See Chapter 4 for more information in this regard. This makes it important for you to understand these values. parallel operations. then you know what the statement looks like. If you EXPLAINed a statement dynamically. • SORTN####. you can get good information about the access path. and avoiding any table access. especially if you don’t have the remote accessed required from a PC to use those tools. and the proper organization of your data. Unless you have historical statistics that happen to correspond to the time the EXPLAIN was executed. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 71 . if you’ve EXPLAINed a package or plan. then any database enforced RI relationships and associated access paths are not exposed in the EXPLAIN tables. and DELETES in the program you have EXPLAINed. You have to be aware of this to effectively do performance tuning and predictive analysis. • PREFETCH This column indicates whether or not prefetch may play a role in the access. • Access path information for enforced referential constraints If you have INSERTS. and examining the PLAN_TABLE. • The statistics used The optimizer used catalog statistics to help determine the access path at the time the statement was EXPLAINed. grouping. UPDATES. and if they are different now and can change the access path. The DB2 Performance Monitoring and Tuning Guide (DB2 9) documents all of these tables. then you don’t know what the statistics looked like at the time of the EXPLAIN. among others. partitions eliminated.EXPLAIN AND PREDICTIVE ANALYSIS • INDEXONLY A value of “Y” in this column indicates that the access required could be fully served by accessing the index only. that EXPLAIN does not tell you about your queries. however. There is some information.

you should always be looking at the program source code of the program the statement is embedded in. These things are discussed further. The EXPLAIN output may show a perfectly good and efficient access path. Optimization Service Center and Visual EXPLAIN As the complexity of managing and tuning various workloads continues to escalate. IBM Query Advisor etc. or for a group of statements • A subsystem parameter browser with keyword find capabilities • The ability to graphically create optimization hints (a feature not found in the V8 Visual Explain product) 72 ca. however. • The program source code In order to fully understand the impact of the access path that a statement has used you need to see how that statement is being used in the application program.com . but what is the order of the input data to the statement? What ranges are being supplied? How many transactions are being issued? Is it possible to order the input or the data in the tables. throughout this guide. So. This is not covered in EXPLAIN output. in a manner in which it’s most efficient.) designed to ease the workload of DBAs through a rich set of autonomic tools that help optimize query performance and workloads. but the statement itself could be completely unnecessary. It provides: • The ability to snap the statement cache • Collect statistics information • Analyze indexes • Group statements into workloads • Monitor workloads • An easy-to-understand display of a selected access path • Suggestions for changing a SQL statement • An ability to invoke EXPLAIN for dynamic SQL statements • An ability to provide DB2 catalog statistics for referenced objects of an access path. How many times will the statement be executed? Is the statement in a loop? Can we avoid executing the statement? Is the program issuing 100 statements on individual key values when a range predicate and one statement would suffice? Is the programming performing programmatic joins? These are questions that can only be answered by looking at the program source code. You can use your favorite tool to identify and analyze problem SQL statements and to receive expert advice about statistics that you can gather to improve the performance of problematic and poorly performing SQL statements on a DB2 subsystem. many database administrators (DBAs) are falling farther behind in their struggle to maintain quality of service while also keeping costs in check.EXPLAIN AND PREDICTIVE ANALYSIS • The order of input to your transactions Sure the SQL statement looks OK from an access path perspective. There are many tools currently available (such as IBM Statistics Advisor. (this is where a tool — or even a trace — can be very helpful in order to verify if the SQL statements executed really are what’s expected).

This makes for faster development time. access paths. and more logic in the queries means more opportunities for tuning (if all the logic was in the programs. This product runs on a PC. and gives them incentive to code more logic in their queries. which provides a subset of the functionality of the OSC. This frees the developers from having to consider performance in every aspect of their programming. when the application is finally implemented. then capacity reports can be produced. A lot of time is spent in meetings where people are saying things such as “Well that’s not going to work”. SQL statements can be imported from text files. That is. or “We want everything as fast as possible.EXPLAIN AND PREDICTIVE ANALYSIS The OSC can be used to analyze previously generated explains or to gather explain data and explain dynamic SQL statements. The table definitions and statistics can be used to accurately predict database sizes. then tuning may be a little harder to do). and provides a graphical user interface for entering information about DB2 tables. people scratch their heads saying “…but we thought of everything!” Another approach for large systems design is to spend little time considering performance during the development and set aside project time for performance testing and tuning. and let the database have a first shot at deciding the access path. and then information about DASD models. The DB2 Estimator product will no longer be available after DB2 V8. Once all of this information is input into Estimator. SQL statements can be organized into transactions. and DB2 9. A lot of this time is wasted because the assumptions being made are not accurate. CPU size.” Then. The OSC is available for DB2 V8. and database queries based upon certain assumptions about performance. as well as the amount of CPU required for an application. and queries. Table and index definitions and statistics can be entered directly. then you can use the DB2 Visual Explain product. “I think the database will choose this access path”. These reports will contain estimates of the DASD required. and performs horribly. If you are using DB2 V7. before any actual real programs or test data is available. and so you should download it today! Predicting Database Performance Lots of Developer and DBA time can be spent in the physical design of a database. and transaction frequencies can be set. indexes. Let the logical designers do their thing. These reports are very helpful during the initial stage of capacity planning. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 73 . or imported view DDL files. DB2 Estimator The DB2 Estimator product is another useful predictive analysis tool.

Tools that you can use for testing statements. You can key some INSERT statements for the parent table. but are not limited to: • REXX DB2 programs • COBOL test programs • Recursive SQL to generate data • Recursive SQL to generate statements • Data in tables to generate more data • Data in tables to generate statements Generating Data In order to simulate program access you need data in tables. insert them into a table. then each performance decision should be backed by solid evidence. design ideas. There is no reason why a slightly talented DBA or developer can’t make a few test tables. You could simply type some data into INSERT statements. PERSON_TABLE. Reports can then be generated.EXPLAIN AND PREDICTIVE ANALYSIS If you choose to make performance decisions during the design phase. Say.com . and then use data from that table to generate more data. No actual data has been created yet. generate some test data. and write a small program or two to simulate a performance situation and test their assumptions about how the database will behave. It’s much easier to make design decisions based upon actual test results. if the parent table. and not assumptions. for example. that you have to test various program processes against a PERSON_TABLE table and a PERSON_ORDER table. For example. but you need to test the access patterns of incoming files of orders. or program processes include. and then use the parent table to propagate data to the child table. and given to managers. contained this data: PERSON_ID 1 2 NAME JOHN SMITH BOB RADY 74 ca. This gives the database the opportunity to tell you how to design for performance.

1.PERSON_TABLE The resulting PERSON_ORDER data would look like this: PERSON_ID 1 1 2 2 ORDER_NUM 1 2 1 2 PRDT_CDE B100 B120 B100 B120 QTY 10 3 10 3 PRICE 14. QTY. 3.95 The statements could be repeated over and over to add more data. PRDT_CODE. ‘B100’. 2. ‘B120’. 10. 14. ORDER_NUM. PRICE) SELECT PERSON_ID.95 14. 1. with some test data: INSERT INTO PERSON_ORDER (PERSON_ID. or additional statements can be executed against the PERSON_TABLE to generate more PERSON_TABLE data.PERSON_TABLE UNION ALL SELECT PERSON_ID. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 75 . PERSON_ORDER.95 1.95 FROM YLA.95 1.EXPLAIN AND PREDICTIVE ANALYSIS Then the following statement could be used to populate the child table.95 FROM YLA.

SYSDUMMY1 UNION ALL SELECT N+1 FROM TEMP WHERE N < 10) SELECT N FROM TEMP This statement generates the numbers 1 through 10. but that couldn’t have been known for sure without testing: 76 ca.com . and ready for testing.EXPLAIN AND PREDICTIVE ANALYSIS Recursive SQL (DB2 V8.000 rows of data into a large test lookup table. and a test conducted to determine the performance. We can use the power of recursive SQL to generate mass quantities of data that can then be inserted into DB2 tables. It was quickly determined that the performance of this large lookup table would not be adequate. The table was quickly populated with data. DB2 9) is an extremely useful way to generate test data. Take a look at the following simple recursive SQL statement: WITH TEMP(N) AS (SELECT 1 FROM SYSIBM. The following is a piece of a SQL statement that was used to insert 300. one row each.

'F') SELECT STALE_IND. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 77 .STALETBL (STALE_IND) AS (VALUES 'S'.CASE STALE_IND WHEN 'S' THEN CASE KEYVAL WHEN 0 THEN 1 WHEN 1 THEN 2 WHEN 2 THEN 3 WHEN 3 THEN 4 WHEN 4 THEN 4 WHEN 5 THEN 6 WHEN 6 THEN 7 WHEN 7 THEN 8 WHEN 8 THEN 9 WHEN 9 THEN 10 END WHEN 'F' THEN CASE KEYVAL WHEN 0 THEN 11 WHEN 1 THEN 12 WHEN 2 THEN 13 WHEN 3 THEN 14 WHEN 4 THEN 15 WHEN 5 THEN 16 WHEN 6 THEN 17 WHEN 7 THEN 18 WHEN 8 THEN 19 WHEN 9 THEN 20 END END AS PART_NUM FROM LASTPOS INNER JOIN STALETBL ON 1=1.EXPLAIN AND PREDICTIVE ANALYSIS WITH LASTPOS (KEYVAL) AS (VALUES (0) UNION ALL SELECT KEYVAL + 1 FROM WHERE LASTPOS KEYVAL < 9) . KEYVAL .

FIRSTNME FROM EMP WHERE EMPNO = '200310'. FIRSTNME FROM EMP WHERE EMPNO = '000030'. You could possibly write a statement such as this to generate those statements: SELECT 'SELECT LASTNAME. FIRSTNME FROM EMP WHERE EMPNO = '000130'. for example. Say. 'E11') RAND() < . The output would look something like this: 1 ---------------------------------------------------------SELECT LASTNAME. 78 ca.EMP WORKDEPT IN ('C01'. FIRSTNME ' CONCAT 'FROM EMP WHERE EMPNO = ''' CONCAT EMPNO CONCAT '''. SELECT LASTNAME.33 The Above query will generate SELECT statements for approximately 33% of the employees in departments “C01” and “E01”. You can write SQL statements that generate statements.' FROM WHERE AND SUSAN.EXPLAIN AND PREDICTIVE ANALYSIS Generating Statements Just as data can be generated so can statements. SELECT LASTNAME. that you needed to generate singleton select statements against the EMP table to test a possible application process or scenario.com .

CAST('2003-02-01' AS DATE).' FROM GENDATA ORDER BY ORDERVAL.ISO) CONCAT '''' CONCAT '). as well as whether or not performance enhancers such as sequential detection or index lookaside will be effective. The following statement was used during testing of high performance INSERTs to an account history table. It could also give you an idea of how buffer settings could impact performance. Running these programs could then give you ideas as to the impact of random versus sequential processing.0)). but instead just simple queries that simulated the anticipated database access. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 79 . The following statement generated 50. HIST_EFF_DTE)' CONCAT ' VALUES(' CONCAT CHAR(ACCT_ID) CONCAT '.EXPLAIN AND PREDICTIVE ANALYSIS You could also use recursive SQL statements to generate statements. or the impact of sequential processing versus skip sequential processing. CAST(1 AS FLOAT)) UNION ALL SELECT ACCT_ID + 5.' CONCAT '''' CONCAT CHAR(HIST_EFF_DTE.000 random insert statements: WITH GENDATA (ACCT_ID. RAND() FROM WHERE GENDATA ACCT_ID < 249997) SELECT 'INSERT INTO YLA. Determining Your Access Patterns If you have access to input data or input transactions then it would be wise to build small sample test programs to determine the impact of these inputs on the database design. HIST_EFF_DTE. HIST_EFF_DTE.ACCT_HIST (ACCT_ID. ORDERVAL) AS (VALUES (CAST(2 AS DEC(11. This would involve simple REXX or COBOL programs that contain little or no business logic.

whether or not one index performs over another. and the other with the equivalent SQL join. Write a simple COBOL program to try access to a table to determine what the proper clustering is. or that all access should be via individual table access (programmatic joins) for the greatest in flexibility. Each potential disagreement among DBAs and application developers should be tested. how effective compression will be for performance. and whether or not adding an index will adversely impact INSERTs and DELETEs. but what is the cost of the programmatic joins for this application? Two simple COBOL programs were coded against a test database. one with a two table programmatic join. and a decision made. 80 ca. It was determined that the SQL join consumed 30% less CPU than the programmatic join.EXPLAIN AND PREDICTIVE ANALYSIS Simulating Access with Tests Sitting around in a room and debating access paths and potential performance issues is a waste of time. Coding for both types of access would be extra programming effort.com . the evidence analyzed. In one situation it was debated as to whether or not an entire application interface should utilize large joins between parent and child tables.

your application is in production and is running fine. and so you don’t need tuning? What is the number one statement in terms of CPU consumption? All of these questions can be answered by monitoring your DB2 subsystems. DB2 provides facilities for monitoring the behavior of the subsystem. This is primarily via the DB2 trace facility. These traces should play an integral part in your performance monitoring process. In this chapter we’ll discuss the traces that are important for monitoring performance. There are six types of traces: • Statistics • Accounting • Audit • Performance • Monitor • Global This chapter will cover the statistics. Now.CHAPTER 6 Monitoring It’s critical to design for performance when building applications. and the applications accessing them. avoided programmatic joins. databases. as well as how to use them effectively for proactive performance tuning. as well as the applications that are connected to them. or the matching index scan running millions of times per day? Are all your SQL statements subsecond responders. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 81 . You’ve designed the correct SQL. DB2 has several different types of traces. clustered commonly accessed table in the same sequence. Is there more you can save? Most certainly! Which statement is the most expensive? Is it the tablespace scan that runs once per day. and have avoided inefficient repeat processing. and SQL statements. DB2 Traces DB2 provides a trace facility to help track and record events within a DB2 subsystem. accounting and performance traces as they apply to performance monitoring.

4. deadlocks. and most often should be the starting point for analysis. DB2 times are classified as follows: • Class 1: This time shows the time the application spent since connecting to DB2. then class 1 is the default class.com . the amount of SQL statements executed. including such things as: • Start and stop times • Number of commits and aborts • The number of times certain SQL statements are issued • Number of buffer pool requests • Counts of certain locking events • Processor resources consumed • Thread wait times for various events • RID pool processing • Distributed processing • Resource limit facility statistics Accounting times are usually the prime indicator of performance problems. and amount of storage consumed within the database manager address space. The interval is typically 10 or 15 minutes per record. and 6 are the default classes for the statistics trace if statistics is specified yes in installation panel DSNTIPN. 3. It is divided into CPU time and waiting time. 5. 82 ca. Accounting Trace The accounting trace provides data that allows you to assign DB2 costs to individual authorization IDs and to tune individual programs. • Class 3: This elapsed time is divided into various waits. Statistics trace classes 1. buffer pool utilization. timeouts. such as the duration of suspensions due to waits for locks and latches or waits for I/O. The DB2 accounting trace provides information related to application programs. This information is collected at regular intervals for an entire DB2 subsystem. logging. The statistics trace can collect information about the number of threads connected.MONITORING Statistics Trace The data collected in the statistics trace allows you to conduct DB2 capacity planning and to tune the entire set of DB2 programs. The statistics trace reports information about how much the DB2 system services and database services are used. including time spent outside DB2. It is a system wide trace and should not be used for charge-back accounting. and much more. If the statistics trace is started using the START TRACE command. • Class 2: This shows the elapsed time spent in DB2.

locking. as well as the overall application time. Class 2 time is a component of class 1 time. Be aware that this can add between 4% and 5% of your overall system CPU consumption. trace class. and many other detailed activities. or 8. On the other hand. Accounting data for class 1 (the default) is accumulated by several DB2 components during normal execution.3.2. Rather. Performance trace data can also be written to SMF records. it does not involve as much overhead as individual event tracing. or sent to IBM for analysis. and batch reporting tools can read those records to produce very detailed information about the execution of SQL statements in the application. Performance traces cannot be automatically started. This includes limited the trace data collected by plan. They also collect a very large volume of information. Accounting class 1 must be active to externalize the information.8. package. but these traces are not written to any external destination. and consume a lot of CPU. If that is the case then adding SMF as a destination for these classes should not add any CPU overhead. When an application connects to DB2 it is executing across address spaces. if you are already writing accounting classes 1. We recommend you set accounting classes 1. and even IFCID.2. and the DB2 address spaces are shared by perhaps thousands of users across many address spaces. then adding 7 and 8 typically should not add much overhead. The accounting trace provides information about the time spent within DB2. the accounting facility uses these traces to compute the additional total statistics that appear in the accounting record when class 2 or class 3 is activated. you must use the –START TRACE(PERFM) command. and can detail SQL performance. Performance traces are expensive to run. Having the accounting trace active is critical for proper performance monitoring. when you start class 2. Performance Trace The performance trace provides information about a variety of DB2 events. Also. Because performance traces can consume a lot of resources and generate a lot of data. and tuning. You can use this information to further identify a suspected problem or to tune DB2 programs and resources for individual users or for DB2 as a whole. analysis. Every occurrence of these events is traced internally by DB2 trace. there are a lot of options when starting the trace to balance the information desired with the resources consumed. This data is then collected at the end of the accounting period. many additional trace points are activated. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 83 .7. However.3.MONITORING DB2 trace begins collecting this data at successful thread allocations to DB2 and writes a completed record when the thread terminates or in some cases when the authorization ID changes. 3. and class 3 time a component of class 2 time. Reports can then be produced by the monitor software. if you are using an online performance monitor then it could already have these classes started. To start a performance trace. 7. Performance traces are usually run via an online monitor tool. including events related to distributed data processing. Performance traces are typically utilized by online monitor tools to track a specific problem for a given plan or package. or the output from the performance trace can be sent to SMF and then analyzed using a monitor reporting tool.

if you are getting RDS or DM failures in the RID pool then there is a good chance that the access path selected is reverting to a table space scan. Therefore. One of the things to watch for are the number of synchronous reads for sequential access. Further investigation will have to be performed to determine the packages within the plan. For each buffer pool in use the report will include the size of the pool. There can be RDS failures. or detail reports. Some of the things to look for in a statistics report: • RID Pool Failures There should be a section of the statistics report that reports the usage of the RID pool for things such as list prefetch. pages written. plus much more. as well as the buffer hit ratio. if there are write engines not available. which can gather and summarize the data for a period of time. Start with a daily summary report. 84 ca. However. and if they are you’ll have to try to influence the optimizer to change the access path (see Chapter 3 for SQL tuning). which may be an indication that the number of pages is too small and pages for a sequential prefetch are stolen before they are used. You may have to test the queries to determine if they are indeed the one’s getting these failures. prefetch operations. Another thing to watch is whether or not any critical thresholds are reached. and also produce detailed accounting reports that will show which threads are getting the failures. It is also the best way to monitor performance trends. and look for specific problems within the DB2 subsystem. and DB2 EXPLAIN can be used to determine which statements are using list prefetch. DM failures. These reports are typically produced by running reporting software provided by a performance reporting product. They are the best gauge as to the health of a DB2 subsystem. It’s also important to monitor the number of getpages per synchronous I/O. These reports are a critical part of performance analysis and tuning. The report will also indicate RID failures. Also reported are the number of times certain buffer thresholds have been hit. random I/O’s.MONITORING Accounting and Statistics Reports Once DB2 trace information has been collected you can create reports by reading the SMF records. and failures due to insufficient size.com . hybrid join. and number of sequential I/O’s. Once you detect a problem then you can produce a detailed report to determine the specific period of time that the problem occurred. you need to produce a detailed statistics report that can identify the time of the failures. and whether or not deferred write thresholds are triggering. In these situations it is important to determine which applications are getting these RID failures. Your reporting software should be able to produce either summary reports. and write I/O’s. or multi-index access. and also coordinate the investigation with detailed accounting reports for the same time period in an effort to attribute the problem to a specific application or process. sequential and random getpages. • Bufferpool Issues One of the most valuable pieces of information coming out of the statistics report is the section covering buffer utilization and performance. If you are getting failures due to insufficient storage you can increase the RID pool size. and the applications using it. multiple index access. Statistics Report Since statistics records are collected typically at 10 minute or 15 minute intervals quite a few records can be collected on a daily basis. and to proactively detect potential problems before they become critical. Please see Chapter 8 for information about subsystem tuning. and hybrid joins. which can report on every statistics interval. You should get familiar with the statistics and accounting reports.

7. authorization id. and Chapter 3 for information about tuning dynamic SQL. active log data sets. but it could also give you an indication of the potential reusability of the statements in the cache. or investigate frequent application rollbacks or other activities that could cause excessive log reads.3. • Deadlocks and Timeouts The statistics report will give you a subsystem wide perspective on the number of deadlocks and timeouts your applications have experienced. You also get statistics about the use of dynamic statement cache and the number of times statement access paths were reused in the dynamic statement cache. Please see Chapter 8 for information about subsystem tuning. and the valuable information they provide. number of reads from the log buffer. This could give you a good indication about the size of your cache. you can also have the option to produce one report per thread. • EDM Pool Hit Ratio The statistics report will show how often database objects such as DBD’s. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 85 . In addition.2. You can also use accounting reports to determine which applications are experiencing the problem. correlation id. and package tables are requested as well as how often those requests have to be satisfied via a disk read to one of the directory tables. number of unavailable output buffers. This accounting detail report can give very detailed performance information for the execution of a specific application process. This could give you an indication as to whether or not you need to increase log buffer sizes. along with the DB2 Administration Guide (DB2 V7. DB2 V8) or DB2 Performance Monitoring and Tuning Guide (DB2 9) to interpret the information provided. You can use this as an overall method of detecting deadlocks and timeouts across all applications. and its effectiveness. You should be using your statistics reports on a regular basis. Please see Chapter 8 for more information about the EDM pool. cursor tables. If the statistics summary report shows a positive count you can use the detailed report to find out at what time the problems are occurring. If you have accounting classes 1. and using your monitoring software documentation.MONITORING • Logging Problems The statistics report will give important information about logging. This has only been a sample of the fields on the statistics report. You can use this to determine if the EDM pool size needs to be increased. This includes the number of system checkpoints. Accounting Report An accounting report will read the SMF accounting records to produce thread or application level information from the accounting trace. and total log writes. package. and more. or archived log data sets. These reports typically can summarize information at the level of a plan. then the information will be reported at both the plan and package level. and 8 turned on.

• Buffer Usage The accounting report contains buffer usage at the thread level. The buffer pool information in the accounting report will also indicate just how well the application is utilizing the buffers. or a wait issue (CPU represents very little of the overall class 2 time. The report can be used during buffer pool tuning to determine the impact of buffer changes on an application. The information in the report can also be used to approximately determine the efficiency of your DASD by simply dividing the number of synchronous I/O’s into the total synchronous I/O wait time.com . This is important in determining if you have access path problems in a specific application. and how much time the application spends waiting on I/O. Some of the things to look for in an accounting report include: • Class 1 and Class 2 Timings Class 1 times (elapsed and CPU) include the entire application time. • RID Failures The accounting report does give thread level RID pool failure information. It also reports the time the thread spent waiting on locks. Within class 2 you could be having a CPU issue (CPU time represents the majority of class 2 time). Then perhaps you can see which objects the application uses. • Excessive Synchronous I/O’s Do you have a slow running job or online process? Exactly what is slow about that job or process? The accounting report will tell you if there are a large number of excessive random synchronous I/O’s being issued. • Deadlocks. or threads. The first question to ask when an application is experiencing a performance problem is “where is the time being spent?” The first indication of the performance issue being DB2 will be a high class 2 time relative to the class 1 time. and Lock Waits The accounting report includes information about the number of deadlocks and timeouts that occurred on a thread level. You can use this thread level information to determine which applications are accessing buffers randomly versus sequentially. and use this information to separate sequentially accessed objects from randomly accessed objects into different buffer pools (see Chapter 8 on subsystem tuning). If you have situations in which certain buffers have high random getpage counts you may want to look at which applications are causing those high number of random getpages.MONITORING You can use the accounting report to find specific problems within certain applications. and represents the amount of time the application spent in DB2. including the time spent within DB2. Timeouts. This will give you a good indication as to whether or not you need to do additional investigation into applications that are having locking issues. Class 2 is a component of class 1. This information can be used to determine how a specific application or process is using the buffer pools. but class 3 wait represents most of the time). or maybe your entire system is CPU bound (Class 2 overall elapsed is not reflected in class 2 CPU and class 3 wait time combined). 86 ca. programs. This information is very important for performance tuning because it allows you to determine which programs in a poorly performing application should be reviewed first. • Package Execution Times If accounting classes 7 and 8 are turned on then the account report will show information about the DB2 processing on a package level.

you can further examine the most expensive programs (identified by package) to look for tuning changes. or that the majority of elapsed time is actually class 2 CPU time. You can use the package level information to determine which program uses the most CPU. I/O activity. there’s no silver bullet for improving overall large system performance. and performance trace data to line buffers. Once you do that. but can be incorporated into a long-term solution that identifies problems and tracks changes. Online Performance Monitors DB2 can write accounting. DB2 V8) or the DB2 Performance Monitoring and Tuning Guide (DB2 9) states that the performance impact of setting these traces is minimal and the benefits can be substantial. 7. These online monitors can be used to review subsystem activity and statistics. and 3 are already started has nominal impact.e. There’s been some concern about the performance impact of this level of DB2 accounting. adding accounting classes 7 and 8 when 1. Here the "80/20" rule applies. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 87 . 2. monitor. and an effective manner of communicating performance tuning opportunities and results is crucial. and 8 are started. and dynamically execute and report on performance traces. This allows for online performance reporting software to read those buffers. then that may be an indication of an inefficient repeat process in one of the programs for the plan. 7. In addition. Tests performed at a customer site demonstrated an overall system impact of 4. You should use the accounting report to determine if your performance problem is related to an inefficient repeat process. statistics. your online monitor software). and change the application or database to support a more efficient path to the data. 3. DB2 accounting traces 1. If the report shows a very high getpage count. you'll eventually have to address application performance. and report on DB2 subsystem and application performance in real time. We can tune the DB2 subsystem parameters. thread level performance information.MONITORING • High Getpage Counts and High CPU Time Often it is hard to determine if an application is doing repeat processing when there are not a lot of I/O’s being issued.3 percent for all DB2 activity when accounting classes 1. Setting the Appropriate Accounting Traces DB2 accounting traces will play a valuable role in reporting on application performance tuning opportunities. Management support is a must. 2. as does the addition of most other performance monitor equivalent traces (i. This reporting process can serve as a quick solution to identifying an application performance problem." report on it to your boss. Overall Application Performance Monitoring As much as we hate to hear it. but often we’re only accommodating a bad situation. and 8 must be set to monitor performance at the package level. deadlocks and timeouts. 3. You can quickly identify the "low-hanging fruit. There’s a way to quickly assess application performance and identify the significant offenders and SQL statements causing problems. 2. logging and storage. and try to identify any inefficiencies in that program. The IBM DB2 Administration Guide (DB2 V7.

211 86.000 0.000 0. Total SQL Count.TTT HHH:MM:SS.000 0..TTT DB2 TCB. This information is reduced to the amount of elapsed time and CPU time the application consumes daily and the number of SQL statements each package issues daily. OTHER WR I/O.AVERAGE ---. LOG QUIESCE.. You need a reporting tool that formats DB2 accounting traces from System Management Facilities (SMF) files to produce the type of report you’re interested in.-. LOCK/LATCH.000 0..652 1:29.. Some can even produce customized reports that can filter only the information you want out of the wealth of information in trace records.com . DRAIN LOCK.000 0...0 0 0. You can process whatever types of reports you produce so that a concentrated amount of information about DB2 application performance can be extracted.000 0.000 0.000 0..000 0.567 010:28:50.000 0. CLAIM RELEASE ARCH LOG READ PG LATCH CONT WT DS MSGS WT GLBL CONT GLB CHLD L-LK GLB OTHR L-LK GLB PSET P-LK GLB PAGE P-LK GLB OTHR P-LK OTHER DB2..000 0.000 0.000 0. OTHER RD I/O.000 0..000 0. Total DB2 Elapsed.000 0...214 010:55:40.. DB2 SERVICES. I/O..000 0. the accounting data will must be organized and summarized up to the application level.000 58.205 0.000 0..MONITORING Summarizing Accounting Data To effectively communicate application performance information to management..000 0.000 0.0 0 HHH:MM:SS.000 0.. Total DB2 TCB): ...000 0.196 88 ca..001 0.000 0.335 43. The following example is from a package level report with the areas of interest highlighted (Package Name..9 15948446 0.000 0.. Most reporting tools can produce DB2 accounting reports at a package summary level.000 0. 0.000 0.000 0. This highly specific information will be your first clue as to which packages provide the best DB2 tuning opportunity.000 25:18.TOTAL ---PACKAGE EXECUTIONS -----------------COLL ID CPG2SU01 PROGRAM FHPRDA2 HHH:MM:SS.015 0..TTT AVG DB2 ELAPSED TOTAL DB2 ELAPSED AVG SQL COUNT TOTAL SQL COUNT AVG STORPROC EXECUTED TOT STORPROC EXECUTED AVG UDFS SCHEDULED TOT UDFS SCHEDULED 0.008 0.000 0.

Once the standard reports are processed and summarized.0001 0.7179 2006.1178 4653.252 739. and issues nearly 2 million SQL statements.0001 0. The following spreadsheet can be derived by extracting the fields of interest from a package level summary report: Package ACCT001 ACCT002 ACCTB01 RPTS001 ACCT003 RPTS002 HRPK001 HRPKB01 HRBKB01 CUSTB01 CUSTB02 CUST001 CUSTB03 RPTS003 CUST002 CUST003 CUSTB04 CUSTB05 Executions 246745 613316 8833 93 169236 2686 281 21846 505 1 1 246670 280 1 47923 68628 2 563 Total Elapsed 75694.2252 0.0017 0.0779 0.6083 1653.0198 0.1498 0.0001 0.9935 3862.0315 0.0232 24171.6254 1399.7926 2723. all the information for a specific interval (say one day) can appear in a simple spreadsheet.1804 0. For example.7605 33439.0428 0.5773 1416.0051 0.0006 0.7132 Look for some simple things to choose the first programs to address.793 7.0019 0.2536 884.021 0.4162 1183. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 89 .0297 0.0107 358.0019 0.1485 2491. You could write a slightly more sophisticated program to read the SMF data directly to produce similar summary information if you wish to avoid dependency on the reporting software.2309 CPU/SQL 0.2744 1416.268 Elapsed/SQL 0.3568 10796.02 0.1663 4." and you can write a REXX program to do such work in a few hours.36 0.0035 0.5351 4653.0063 0. These are some of the highest consumers of CPU and they also have a relatively high average CPU per SQL statement.0341 0.1498 12636.0087 1.2111 Total CPU 5187.0959 2130.1975 0. package ACCT001 consumes the most CPU per day.0061 0. tuning these inefficient statements could yield significant savings.0013 0.2804 2648.5143 2079.9468 1249.0499 591. the sheer quantity of statements issued indicates an opportunity to save significant resources.MONITORING If you lack access to a reporting tool that can filter out just the pieces of information desired.2409 2017.7678 1191.4324 0.6034 2.0402 0.0071 0.0033 0.9468 0.4292 6998.6254 1399.2146 0.3812 0.0512 86.1267 5163. Although the CPU consumed per statement on average is low.3568 0.0164 884.4262 4381.7956 0.005 4.0114 0. You can sort the spreadsheet by CPU descending. If just a tiny amount of CPU can be saved.3583 4603.9859 16.3254 5163.0129 0.0015 0.2992 26277.793 0.2862 0.022 0.5509 3428.0003 1.1262 3633. With high consumers at the top of the report.3082 26.0918 3.2068 1232.9989 2198.4523 716.9306 Total SQL 1881908 1310374 531455 5762 1124463 2686 59048 316746 5776 7591111 7971317 635911 765906 1456541 489288 558436 3916502 1001 Elaps/Execution CPU/Execution 0. it will quickly add up.0182 0.2694 713.3067 0.0148 875. Since the programs consume significant CPU per day. you can write a simple program in any language to read the standard accounting reports and pull out the information you need.0004 0.1886 0.0148 0.9859 0.2022 4654.0027 0.9935 3862.2554 0. the low hanging fruit is easy to spot. REXX is an excellent programming language well-suited to this type of "report scraping.0006 0.5269 75.1347 1. The same applies to package ACCT002 and packages RPTS001 and RPTS002. This indicates there may be some inefficient SQL statements involved.

you need buy-in from management and application developers. This can sometimes be the most difficult part because. and ACCT003 can be grouped together. An in-house naming standard can be used to combine all the performance information from various packages into application-level summaries. unfortunately. and categorize the package information by application. and their accounting information summarized to represent the "ACCT" application. ACCT002. This is easy and amazingly effective! The summarized reports can report on information on the application level. the DB2 accounting report data for programs ACCT001. For example. Reporting to Management To do this type of tuning. if the in-house accounting application has a program naming standard where all program names begin with "ACCT. Most DBAs and systems programmers who lack these reports and look only at online monitors or plan table information are really just shooting in the dark. Most capacity planners have formulas for converting CPU time into dollars. If you get this formula from the capacity planner. The following shows a simple chart developed using an in-house naming standard and a CPUto-dollars formula. most application tuning involves costly changes to programs." then the corresponding DB2 package accounting information can be grouped by this header. Without this type of summarized reporting. it’s difficult to do any sort of truly productive tuning. ANNUAL CPU COST 90 ca.com . ACCT002. This lets you classify applications and address the ones that use the most resources. so examine those first. RPTS001. you can easily turn your daily package summary report into an annual CPU cost per application.MONITORING ACCT001. Give this report to the big guy and watch his head spin! This is a really great tool for getting those resources allocated to get the job done. One way to demonstrate the potential ROI for programming time is to report the cost of application performance problems in terms of dollars. and RPTS002 represent the best opportunities for saving CPU. Thus.

you need additional analysis of the programs that present the best opportunity. Finding the Statements of Interest Once you’ve found the highest consuming packages and obtained management approval to tune them. This makes it perfectly clear to management what the tuning efforts have accomplished and gives incentive for further tuning efforts. If plan table information isn’t available for the targeted package. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 91 . a short-term performance trace may be an effective tool for gathering information on frequent SQL statements and their true costs. in dollars. There are many ways to gather statement-level information: • Get program source listings and plan table information • Watch an online monitor • Run a short performance trace against the application of interest Performance traces are expensive. then you can rebind that package with EXPLAIN(YES). sometimes adding as much as 20 percent to the overall CPU costs. It's much easier to tune with a team approach where different team members can be responsible for different analysis.MONITORING Make sure you produce a "cost reduction" report. Ask questions such as: • Which DB2 statements are executing and how often? • How much CPU time is required? • Is there logic in the program that’s resulting in an excessive number of unnecessary database calls? • Are there SQL statements with relatively poor access paths? Involve managers and developers in your investigation. If it's hard to get the outage to rebind EXPLAIN(YES) or a plan table is available for a different owner id. A bar chart with before and after results can be highly effective in conveying performance tuning impact. Consider providing a visual representation of your data. However. once a phase of the tuning has completed. you could also copy the package with EXPLAIN(YES) (for example execute a BIND into a special/dummy collection-id) rather than rebinding it.

and reverted to a tablespace scan. You use this report to identify the most expensive applications or programs. to identify and tune SQL statements in your most expensive packages: • Package Report A list of all packages in your system sorted by the most expensive first (usually by CPU). You can also use these other tools. advised that the query rarely returns more than a single row of data. In this case. This turned out to be true. The programmer. Q_QB_PL 000640-01-01 000640-01-02 001029-01-01 001029-01-02 001029-01-03 PNAME MT TNAME PERSON_ACCT T_NO AT 1 0 I MC ACC_NM 1 0 IX NJ_CUJOG PF NNNNN NNNYN NNNNN NNNNN NNNNN S L S ACCT001 0 ACCT001 3 RPTS001 0 RPTS001 4 RPTS001 1 AAIXPAC0 N N AAIXPRC0 N CCIXACC0 N CCIXAOC0 N PERSON ACCOUNT ACCT_ORDER 1 2 3 I I I 1 1 2 Here. While we don't frown on hybrid joins. and it orders the result. The Tools You Need The application performance summary report identifies applications of interest and provides the initial tool to report to management. our system statistics were showing us that there were Relational Data Server (RDS) subcomponent failures in the subsystem. the RDS failures and subsequent tablespace scan were avoided. We wondered whether this query was causing an RDS failure. our most expensive program issues a simple SQL statement with matching index access to the PERSON_ACCT table. Since this query was executing many thousands of times per day.com . Our high-consuming reporting program is performing a hybrid join (Method=4). The bubble sort algorithm was almost never used because the query rarely returned more than one row. 92 ca. they screamed for further investigation. a bubble sort in the application program replaced the DB2 sort. or pieces of documentation. the CPU savings were substantial.MONITORING The following example shows PLAN_TABLE data for two of the most expensive programs in our example. when consulted. when combined with the accounting data. We tuned the statement to use nested loop over hybrid join. This is your starting point for your application tuning exercise. You also use the report to sell the idea of tuning to management and to drill down to more detailed levels in the other reports. which results in a sort (Method=3). and the CPU associated with DB2 sort initialization was avoided. While these statements may have not caught someone's eye just by looking at EXPLAIN results. and CPU and elapsed time improved dramatically.

and it's surprising how often application programs will go to the database when they don't have to. then they're really poor-performing statements. add or change — or even drop. If the DDL isn’t available. You may quickly find some bad access paths that can be tuned quickly. column data types. • DDL or ERD You're going to need to know about the database. • Plan Table Report Run extractions of plan table information for the high-consuming programs identified in your package report. Don’t overlook the importance of examining the application logic. The program may be executing the world’s best-performing SQL statements. you can print out the Data Definition Language (DDL) SQL statements used to create the tables and indexes. and knowing where data is. cluster ratio. This type of monitoring will help to drill down to the high-consuming SQL statements within these packages. Don't forget to consider the frequency of execution as indicated in the package report. Delete. An Entity Relationship Diagram (ERD) is the best tool for this. This has to do primarily with the quantity of SQL statements being issued. clustering. first key cardinality. The best performing SQL statement is the one that is never issued. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 93 .MONITORING • Trace or Monitor Report You can run a performance trace or watch your online monitor for the packages identified as high consumers in your package report. • Index Report Produce a report of all indexes on tables in the database of interest. you can use a tool such as DB2LOOK (yes. but if none is available. Update processing as well as utilities. columns names. table name. and full key cardinality. columns sequence. Even a simple thing such as a small sort may be really expensive if executed often. you can use this against a mainframe database) to generate the DDL. but if the data isn't used. Use this report when tuning SQL statements identified in the plan table or trace/monitor report. This includes relationships between tables. Indexes not used will create an overhead for Insert. There may be indexes you can take advantage of. This report should include the index name.

Page intentionally left blank .

You need to avoid this whenever possible. This is particularly true for any code tables that rarely change (if the codes are frequently changing then perhaps it’s not a code table). or Windows based clients you should read the code tables once when the application starts. If you are using a remote application. This chapter will describe some general recommendations for improving performance in applications. or to validate the data input on a screen. and use them as CICS data tables. or on an as-needed basis. needs. With optimistic locking you don’t have to constantly reread the object (the table or query to support the object).CHAPTER 7 Application Design and Tuning for Performance While the majority of performance improvements will be realized via proper application design. editing data. This technique is described further in the locking section of this chapter. translating values. For larger code tables you can employ a binary or quick search algorithm to quickly look up values. If you are concerned that an object is not “fresh” and that you may be updating old data then you can employ a concept call optimistic locking. nowhere else does “it depends” matter most than when dealing with applications. The first thing that you have to ask yourself is “is this statement necessary?” or “have I already read that data before?” One of the major issues with applications today is the quantity of SQL statements issued per transaction. If other situations. and characteristics of your application. This is because the performance design and tuning techniques you apply will vary depending upon the nature. Issuing Only the Necessary Statements The best performing statement is the statement that never executes. This is much faster than using DB2 to look up the value for every CICS transaction. Instead you read it once. The most expensive thing you can do is to leave your allied address space (DB2 distributed address space for remote queries) to go to the DB2 address spaces for a SQL request. or populating drop-down boxes then those code tables should be locally cached for better performance. or by tuning the application. The VSAM files can be refreshed on regular intervals via a batch process. For a batch COBOL job for example read the code tables you’ll use into in-core tables. Caching Data If you are accessing code tables for validating data. and cache the values locally to populate various on screen fields. and when you go to update the object you can check the update timestamp to see if someone else has updated the object before you. For CICS applications set up the code in VSAM files. and especially for object or service oriented applications you should always check if an object has already been read in before reading it again. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 95 . This will avoid blindingly and constantly rereading the data every time a method is invoked to retrieve the data.

These nextkey tables typically wound up being a huge application bottleneck.‘Credit’. DB2 9). If you are on DB2 V7 then your only choice of the two is the identity column. DB2 9) and sequence objects (DB2 V8. In this case that result table will be the result of an insert.NAME.‘Master Card’. there are some limitations to the changes you can make to identity columns in DB2 V7 so you are best placing them in their own separate table of one row and one column. BALANCE) VALUES (NEXT VALUE FOR ACCT_SEQ. However.APPLICATION DESIGN AND TUNING FOR PERFORMANCE Utilizing DB2 for System Generated Keys One of the ways you can reduce the quantity of SQL issued.com . as well as a major source of locking contention in your application is to use DB2 system generated key values to satisfy whatever requirements you may have for system generated keys.ACCOUNT (ACCT_ID. and then return that unique key back to the application: SELECT ACCT_ID FROM FINAL TABLE (INSERT INTO UID1. There are several ways to generate key values inside DB2. Traditionally people have used a “next-key” table. Identity columns are attached to tables. and sequence objects are independent of tables. and using them as a next key generator table. These settings will impact the number of values that are cached in advance of a request. the GENERATE_UNIQUE function. ROWIDs).50000)) Check the tips chapter for a cool tip on using the same sequence object for batch and online key assignment! 96 ca. as well as whether or not the order the values are returned is important. the RAND function. In the following example we use a sequence object to generate a unique key. Make sure that when you use sequence objects (or identity columns) that you utilize the CACHE and ORDER settings according to your high performance needs. and more) you can reduce the number of SQL statements your application issues by utilizing a SELECT from a result table (DB2 V8. TYPE. for example. CACHE 50 NO ORDER. DB2 9). incrementing and updating the column. triggers. The settings for a high level of performance in a data sharing group would be. which contained one row of one column with a numeric data type. and then using the new value as a key to another table. DB2 V8. to generate keys. This typically involved reading the column. The high performance solution for key generation in DB2 is the sequence object. When using sequence objects or identity columns (or even default values. two of which are identity columns (DB2 V7.

The INSTEAD OF trigger can be coded to perform the necessary changes to the tables of the join using the transition variables from the view. You can create a view that joins two tables that are commonly accessed together. and scrollable or non-scrollable cursors. Since the view is on a join. then the SQL join will outperform the programmatic join. There is also support for positioned UPDATEs and DELETEs. Multi-row fetching gives us the opportunity to return multiple rows (up to 32. DB2 9) • MERGE statement (DB2 9) These are in addition to the possibility of doing a mass INSERT. The sample programs DSNTEP4 (which is DSNTEP2 with multi-row fetch) and DSNTIAUL also can exploit multi-row fetch. or perhaps you are using an object oriented or service oriented design. and DELETEs against the view.767) in a single API call with a potential CPU performance improvement somewhere around 50%. the two table SQL join used 30% less CPU than the programmatic join. In almost every single situation in which a SQL join can be coded instead of a programmatic join. you can define an INSTEAD OF trigger on that view to allow INSERTs. it is a read-only view. UPDATEs. It works for static or dynamic SQL. DB2 9) • Multi-row insert (DB2 V8. MULTI-ROW FETCH CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 97 . Multi-Row Operations In an effort to reduce the quantity of SQL statements the application is issuing DB2 provides for some multi-row operations. This includes: • Multi-row fetch (DB2 V8. In our tests of a simple two table SQL join versus the equivalent programmatic join (FETCH loop within a FETCH loop in a COBOL program). or DELETE operation. Then the object based application can treat the view as a table. There is a certain set of trade-offs that have to be considered here: • Flexible program and table access design versus improved performance for SQL joins • The extra programming time for multiple SQL statements to perform single table access when required or joins when required versus the diminished performance of programmatic joins • The extra programming required to add control breaks in the fetch loop versus the diminished performance of programmatic joins DB2 does provide a feature known as an INSTEAD OF trigger (DB2 9) that allows somewhat of a mapping between the object world and multiple tables in a database.APPLICATION DESIGN AND TUNING FOR PERFORMANCE Avoiding Programmatic Joins If you are using a modern generic application design where there is an access module for each DB2 table. UPDATE. then you are using a form of programmatic joins. However.

however.APPLICATION DESIGN AND TUNING FOR PERFORMANCE There are two reasons to take advantage of the multi-row fetch capability: 1. 98 ca. and not necessarily an elapsed time saver. This results in great savings for our distributed SQLJ applications. or if you hit end of file (SQLCODE 100). Coding for a multi-row fetch is quite simple. When using multi-row fetch. To reduce the number of statements issued between DB2 and the DDF address space. multirow inserts can have a dramatic impact on the performance of remote applications in that the number of statements issued across a network can be significantly reduced. if the fetch failed (negative SQLCODE). The second way to take advantage of multi-row fetch is in our distributed applications that are using block fetching. In a completely random test where we expected on average 20 rows per random key. which means that any one failure of any of the inserts will only impact that one insert of the set. Instead use the SQLCODE field of the SQLCA to determine whether your fetch was successful (SQLCODE 000). 2. The INSERT statement simply needs to contain the “FOR n ROWS” clause. In one situation the observed benefit or this feature was when a remote SQLJ application migrated from DB2 V7 to DB2 V8 it did not have a CPU increase. Keep in mind. MULTI-ROW INSERT As with multi-row fetch reading multiple rows per FETCH statement. The multi-row insert can be coded as ATOMIC. The first way to take advantage of multi-row fetch is to program for it in your application code. The basic changes include: • Adding the phrase “WITH ROWSET POSITIONING” to a cursor declaration • Adding the phrases “NEXT ROWSET” and “FOR n ROWS” to the FETCH statement • Changing the host variables to host variable arrays (for COBOL this is as simple as adding an OCCURS clause) • Placing a loop within you fetch loop to process the rows These changes are quite simple. the GET DIAGNOSTICS statement is not necessary. In our tests of a sequential batch program the use of 50 row fetch (the point of diminishing return for our test) of 39 million rows of a table reduced CPU consumption by 60% over single-row fetch. In addition. IBM states that multi-row inserts can result in as much as a 25% CPU savings over single row inserts. and should be avoided due to high CPU overhead. Once in compatibility mode in DB2 V8 the blocks used for block fetching are built using the multi-row capability without any code change on our part.com . a multi-row insert can insert multiple rows into a table in a single INSERT statement. If you received an SQLCODE 100 then you can check the SQLERRD3 field of the SQLCA to determine the number of rows to process. meaning that if one insert fails then the entire statement fails. and can have a profound impact on performance. our 20 row fetch used 25% less CPU than the single-row fetch. and the host variables referenced in the VALUES clause need to be host variable arrays. that multi-row fetch is a CPU saver. or it can be coded as NOT ATOMIC ON SQLERROR CONTINUE. To reduce the number of statements issued between your program address space and DB2.

Typically the application would in this case perform a blind update. Many times applications are interfacing with other applications. and should be avoided for performance reasons unless it needed. and rows that do not exist in the target are inserted. the application would simply attempt to update the rows of data in the table. COMM.BONUS WHEN NOT MATCHED THEN INSERT (EMPNO. SALARY. :COMM. if you get a SQLCODE of zero then all the inserts were a success.EMPNO WHEN MATCHED THEN UPDATE SET SALARY = . SALARY.COMM . or the underlying tables of a fullselect) using the specified input data. and then programmatically insert or update the table with the new data. compare that data to the new incoming data. it can be coded as ATOMIC or NOT ATOMIC. In these situations an application may receive a large quantity of data that applies to multiple rows of a table.SALARY INPUT_TBL. The MERGE statement updates a target (table or view. INPUT_TBL. then the application would insert the data instead. MERGE can utilize a table or an array of variables as input.BONUS) = = INPUT_TBL.APPLICATION DESIGN AND TUNING FOR PERFORMANCE As with the multi-row fetch.SALARY.EMPNO = EXISTING_TBL. the application may read all of the existing data. BONUS) ON INPUT_TBL. INPUT_TBL. Rows in the target that match the input data are updated as specified. :BONUS) FOR :ROW-CNT ROWS) AS INPUT_TBL(EMPNO. and if any update failed because a row was not found. That is. and there is no need for additional analysis.COMM. :SALARY. The following example shows a MERGE of rows on the employee sample table: MERGE INTO EMP AS EXISING_TBL USING (VALUES (:EMPNO. The GET DIAGNOSITICS statement should be used along with NOT ATOMIC to determine which updates or inserts have failed. the GET DIAGNOSTICS statement is not initially necessary.BONUS CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 99 . Remember. COMM. MERGE STATEMENT DB2 9 supports this type of processing via the MERGE statement.EMPNO. In the case of a failed non-atomic multi-row insert you’ll get a SQLCODE of -253 if one or more of the inserts failed. BONUS) VALUES (INPUT_TBL.COMM INPUT_TBL. The NOT ATOMIC option will allow rows that have been successfully updated or inserted to remain if others have failed. Since the MERGE operates against multiple rows. INPUT_TBL. Only then should you use GET DIAGNOSTICS to determine which one failed. In other situations.

UDFs actually allow developers and DBAs to extend the capabilities of the database. including legacy application programs. A feature such as database enforced referential integrity (RI) is a perfect example of something that is quite easy to implement in the database. Using advanced SQL for performance was addressed in Chapter 3 of this guide. 100 ca. User-Defined scalar functions work on individual values of a parameter list. just like complex SQL statements. UDFs place more logic into the highly portable SQL statements. This can be used to create some absolutely amazing results. Once your processing is inside SQL statements you can put those SQL statements anywhere. making this logic more easily reusable enterprise wide. Also. in many cases having data intensive logic located on the database server will result in improved performance as that logic can process the data at the server. Placing Data Intensive Business Logic in the Database There are many advanced features of DB2 UDB for z/OS that let you take advantage of the power of the mainframe server. Virtually any type of processing can be placed in a UDF.APPLICATION DESIGN AND TUNING FOR PERFORMANCE As with the multi-row insert operation the use of GET DIAGNOSTICS should be limited. but would take significantly longer time to code in a program. These advanced database features also allow application logic to be placed as part of the database engine itself. A table function can return an actual table to a SQL statement for further processing (just like any other table). and return a single value result. • Advanced and Complex SQL Statements • User-Defined Functions (UDFs) • Stored Procedures • Triggers and Constraints These advanced features allow applications to be written quickly by pushing some of the logic of the application into the database server. User-defined functions (UDF) provide a major breakthrough in database programming technology. USER-DEFINED FUNCTIONS Functions are a useful way of extending the programming power of the database engine. So that anywhere you can run your SQL statements (say. and having the logic centrally located makes it easier to manage than client code. This allows for more processing to be pushed into the database engine. from a web browser) you can run your programs! So. Functions allow us to push additional logic into our SQL statements.com . and so let’s address the other features here. Most of the time advanced functionality can be incorporated into the database using these features at a much lower development cost than coding the feature into the application itself. and only return a result to the client. as well as push legacy processing into SQL statements. Reusing existing logic will mean faster time to market for new applications that need that logic. which in turns allows these types of processes to become more centralized and controllable.

By using a stored procedure the client will only need to have authority to execute the stored procedures and will not need authority to the DB2 tables that are accessed from the stored procedures. Most of the trade-offs involve sacrificing performance for things like flexibility. if a UDF is used to process data only then it can be a performance disadvantage. therefore proper use of stored procedures to accept a request. If the UDFs process large amounts of data. A change can be implemented while access is prevented. and the stored procedure contains the business logic. the client takes care of things like display logic and edits. If business logic and SQL access is encapsulated within stored procedures there is less dependency on client or application server code for business processes. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 101 . UDFs can be a performance advantage or disadvantage. Since stored procedures can be used to encapsulate business logic in a central location on the mainframe. Stored procedures can be a performance benefit for distributed applications. If a stored procedure is coded in this manner then it can be a significant performance improvement. especially if the UDF is invoked many times or embedded in a table expression. if the UDF results in the application program issuing fewer SQL statements. if the stored procedures contain only a few or one SQL statement the advantages of security. invoking that program from a SQL statement. as data type casting (for SQL scalar UDFs compared to the equivalent expression coded directly in the SQL statement) and task switch overhead (external UDFs run in a stored procedure address space) are expensive (DB2 V8 relieves some of this overhead for table functions). A properly implemented stored procedure can help improve availability. and then placing that SQL statement where it can be access via a client process may just be worth that expense! Simply put. availability. In addition. and reusability can be realized. but performance will be worse than the equivalent single statement executions from the client due to task switch overhead. reusable code. security. The network overhead involved in sending multiple SQL commands and receiving result sets is quite significant. or getting access to a legacy process then chances are that the UDF is the right decision. This simplifies the change process.APPLICATION DESIGN AND TUNING FOR PERFORMANCE Also just like complex SQL. and return a result to the SQL statement. It is possible to minimize the impact of distributed application performance with the proper use of stored procedures. there may be a performance advantage over the equivalent client application code. or a performance problem. Converting a legacy program into a UDF in about a day’s time. and makes the code more reusable. Stored procedures can be stopped. However. like UDFs stored procedures can be used to access legacy data stores. they offer a great advantage as a source of secured. and return a result will lessen the traffic across the network and reduce the application overhead. queuing all requestors. and can be part of a valuable implementation strategy. The major advantage to stored procedures is when they are implemented in a client/server application that must issue several remote SQL statements. reusability. That is. and quickly web enable our legacy processes. In every good implementation there are trade-offs. and time to delivery. Conversely. and the procedures restarted once the change has been made. STORED PROCEDURES Stored procedures are becoming more prevalent on the mainframe. process that request with encapsulated SQL statements and business logic.

and centrally located reusable code. insert. and rules can be implemented to tell DB2 to restrict or cascade deletes to a parent when child data exists. data edits are best performed at the client when possible. 102 ca. This helps to ensure data integrity across many application processes. For this reason. and INSERT statements can activate triggers. When triggers. Database enforced RI can be used to ensure that relationships from tables are maintained automatically. RI. however. then are centrally located rules and universally enforced. Triggers and constraints ease the programming burden because the logic. This is especially true if several edits on a data entry screen are verified at the server. enforce certain business rules. In addition. The greatest advantage to triggers and constraints is that they are generally data intensive operations. and update operations. and are invoked during LOAD. and are then dependent upon the table on which they are defined. These features consist of: TRIGGER AND CONSTRAINTS • Triggers • Database Enforced Referential Integrity (RI) • Table Check Constraints A trigger is a database object that contains some application logic in the form of SQL statements that are invoked when data in a DB2 table is changed.com . It could be as bad as one trip to the server and back per edit. They can be used to replicate data. and to fabricate data. since the triggers and constraints are connected to DB2 tables. Child table data cannot be created unless a parent row exists. Since the logic in triggers and constraints is usually data intensive their use typically outperforms the equivalent application logic simply due to the fact that no data has to be returned to the application when these automated processes fire. SQL DELETE. UPDATE. There is one trade-off for performance. Triggers and constraints can be used to move application logic into the database. in the form of SQL is much easier to code than the equivalent application programming logic. There are wonderful advantages to using triggers and constraints. This helps make the application programs smaller and easier to manage.APPLICATION DESIGN AND TUNING FOR PERFORMANCE DB2 9 offers a significant performance improvement for stored procedures with the introduction of native SQL procedures. or check constraints are used in place of application edits they can be a serious performance disadvantage. faster application delivery time. This represents a significant performance improvement for SQL procedures that contain little program logic. Table check constraints are used to ensure values of specific table columns. which can introduce some automatic and centrally controlled intense application logic. Triggers can also be used to automatically invoke UDFs and stored procedures. They most certainly provide for better data integrity. and these types of operations are better performers when placed close to the data. These triggers are installed into the database. and few SQL statements. This would seriously increase message traffic between the client and the server. Running these native SQL procedures will eliminate the task switch overhead of executing in the stored procedure address space. These unfenced SQL procedures will execute as run time structures rather than be converted into external C program procedures (as with DB2 V7 and DB2 V8).

In most cases this means that for every SQL statement issued more data will be read than is needed. especially dynamic prefetch and index lookaside. in some situations. Better yet. and it should have little or no impact on the online transactions. Here there is. once again. all of the tables accessed by the large batch process should have the same cluster as the input. If you code a generic transaction processor that handles both online and batch processing then you could be asking for trouble. or changes to the triggering tables. to significantly improve the performance of these batch processes. can be expensive queries. The triggers will. That is. In the following statement the predicate supports a direct read. have to be recreated in the same sequence they were originally created. In order to enable this type of generic access a generic predicate has to be coded. If online is truly random then you can organize the tables and batch input files for the highest level of batch processing. the data is sorted according to the cluster of the primary table. Searching Search queries. Remember. and restart read for at least one part of a three part compound key: CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 103 . a locally executing batch process that is processing the input data in the same sequence as the cluster or your tables. Organizing Your Input One way to insure the fastest level of performance for large scale batch operations is to make sure that the input data to the process is organized in a meaningful manner. and a restart read in one statement. If you code a generic search query. This is due to the fact that DB2 has a limited ability to match on these types of predicates. and if there are multiple triggers of the same type against a table then they will be executed in the order they were defined. In the following example the SELECT statement basically supports a direct read. a trade off between the amount of program code you are willing to write and the performance of your application. This could mean pre-sorting data into the proper order prior to processing. as well as driving cursors (large queries that provide the input data to a batch process). you will get generic performance. In certain situations trigger execution sequence may be important. a range read. and bound with RELEASE(DEALLOCATE) will utilize several performance enhancers.APPLICATION DESIGN AND TUNING FOR PERFORMANCE If is important to understand that when you are working with triggers you need to respect the triggers when performance schema migrations. sequential read.

it is a detriment.com . This is especially true for online queries that are actually providing three columns of data (all min and max values are equal). For any other query. The very best solution is two have separate predicates for various numbers of key columns provided. This will allow DB2 to have matching index access for each combination of key parts provided. COL3 is available. This is not a bad access path for a batch cursor that is reading an entire table in a particular order.APPLICATION DESIGN AND TUNING FOR PERFORMANCE WHERE COL1 = WS-COL1-MIN AND (( COL2 >= WS-COL2-MIN AND COL2 <= WS-COL2-MAX AND COL3 >= WS-COL3-MIN AND COL3 <= WS-COL3-MAX) OR ( AND OR ( COL2 > WS-COL2-MIN COL2 <= WS-COL2-MAX )) COL1 > WS-COL1-MIN AND COL1 <= WS-COL1-MAX ) These predicates are very flexible. COL2. however. For larger tables the CPU and elapsed time consumed can be significant. however they are not the best performing. This means that the entire index will have to be searched each time the query is executed. FIRST KEY ACCESS: WHERE COL1 = WS-COL1 TWO KEYS PROVIDED: WHERE COL1 = WS-COL1 AND COL2 = WS-COL2 104 ca. The predicate in this example most likely results in a non-matching index scan even though an index on COL1.

These Boolean term predicates will enable DB2 to match on one column of the index. This would involve adding a redundant Boolean term predicate. for this WHERE clause: WHERE COL1 = WS-COL1-MIN AND (( COL2 >= WS-COL2-MIN AND COL2 <= WS-COL2-MAX AND COL3 >= WS-COL3-MIN AND COL3 <= WS-COL3-MAX) OR ( AND OR ( COL2 > WS-COL2-MIN COL2 <= WS-COL2-MAX )) COL1 > WS-COL1-MIN AND COL1 <= WS-COL1-MAX ) CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 105 . Therefore. but will also dramatically increase the statement performance. If the additional statements are not desired then there is another choice for the generic predicates.APPLICATION DESIGN AND TUNING FOR PERFORMANCE THREE KEYS PROVIDED: WHERE COL1 = WS-COL1 AND AND COL2 = WS-COL2 COL2 = WS-COL2 This will dramatically increase the number of SQL statements coded within the program.

or one large generic query to solve any request. the generic query can support the less frequently searched on fields. Let’s say that we need to search for two variations of a name to try and find someone in our database. We have choices for coding our search queries. but allows DB2 to match on the COL1 column. Name searching can be a challenge.com . and are supported by an index.APPLICATION DESIGN AND TUNING FOR PERFORMANCE An additional redundant predicate can be added: WHERE (COL1 = WS-COL1-MIN AND (( COL2 >= WS-COL2-MIN AND COL2 <= WS-COL2-MAX AND COL3 >= WS-COL3-MIN AND COL3 <= WS-COL3-MAX) OR ( AND OR ( COL2 > WS-COL2-MIN COL2 <= WS-COL2-MAX )) COL1 > WS-COL1-MIN AND COL1 <= WS-COL1-MAX )) AND ( COL1 => WS-COL1-MIN AND COL1 <= WS-COL1-MAX ) The addition of this redundant predicate does not affect the result of the query. Then. Once again we are faced with multiple queries to solve multiple conditions. The following query can be coded to achieve that (in this case a name reversal): 106 ca. In many cases it pays to study the common input fields for a search. and then code specific queries that match those columns only.

but uses multi-index access at best. This query gets the job done. Another way you could code the query is as follows: SELECT PERSON_ID FROM WHERE AND PERSON_TBL LASTNAME = ‘RADY’ FIRST_NAME = ‘BOB’ UNION ALL SELECT PERSON_ID FROM WHERE AND PERSON_TBL LASTNAME = ‘BOB’ FIRST_NAME = ‘RADY’ CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 107 .APPLICATION DESIGN AND TUNING FOR PERFORMANCE SELECT PERSON_ID FROM WHERE PERSON_TBL (LASTNAME = ‘RADY’ AND FIRST_NAME = ‘BOB’) OR (LASTNAME = ‘BOB’ AND FIRST_NAME = ‘RADY’).

FIRST_NAME IS NULL) LEFT OUTER JOIN (SELECT PERSON_ID FROM WHERE PERSON_TBL LASTNAME = ‘BOB’ AND FIRSTNME = ‘RADY’) AS B ON A. Finally. and then divides that table into the person table: WITH PRSN_SEARCH(LASTNAME.LASTNAME IS NULL) (A.APPLICATION DESIGN AND TUNING FOR PERFORMANCE This query gets better index access.FIRST_NAME = B. the next query utilizes a during join predicate to probe on the first condition and only apply the second condition if the first finds nothing. The next query uses a common table expression (DB2 V8. ‘BOB’ FROM SYSIBM. Keep in mind that this query may not produce the same results as the previous queries due to the optionality of the search: SELECT COALESCE(A. ‘RADY’ FROM SYSIBM.LASTNAME = ‘RADY’ OR A.FIRST_NAME = ‘BOB’ OR A. FIRST_NAME) AS (SELECT ‘RADY’. it will only execute the second search if the first finds nothing and completely avoid the second probe into the table.com .LASTNAME = B. That is. PRSN_SEARCH B A.PERSON_ID. but will probe the table twice.PERSON_ID) FROM SYSIBM.FIRST_NAME This query gets good index matching and perhaps reduced probes.SYSDUMMY1) SELECT PERSON_ID FROM WHERE AND PERSON_TBL A.EMPNO IS NULL 108 ca.LASTNAME A.SYSDUMMY1 UNION ALL SELECT ‘BOB’.SYSDUMMY1 LEFT OUTER JOIN PERSON_TBL A ON IBMREQD = ‘Y’ AND AND (A. DB2 9) to build a search list. B.

or a non-correlated subquery? Of course it depends on your situation.P# = ‘P2’ AND SP.APPLICATION DESIGN AND TUNING FOR PERFORMANCE Which search query is the best for your situation? Test and find out! Just keep in mind that when performance is a concern there are many choices. a correlated subquery. but these types of existence checks are always better resolved in a SQL statement then with separate queries in your program. Or when there is no index on either inner or outer columns. Existence Checking What is the best for existence checking within a query? Is it a join.S#) CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 109 .S# = S. We also like non-correlated subqueries when there is relatively a small amount of data provided by the subquery. such as this: SELECT SNAME FROM WHERE S S# IN (SELECT S# FROM SP WHERE P# = ‘P2’) Are generally good when there is no available index for inner select but there is on outer table column (indexable). Correlated subqueries. Keep in mind that DB2 can transform a non-correlated subquery to a join. Here are the general guidelines: Non-correlated subqueries. such as this: SELECT SNAME FROM S WHERE EXISTS (SELECT * FROM SP WHERE SP.

if the join results in no extra rows returned then the DISTINCT can also be avoided. Which existence check method is best for your situation? We don’t know.APPLICATION DESIGN AND TUNING FOR PERFORMANCE Are generally good when there is a supporting index available on inner select and there is a cost benefit in reducing repeated executions to inner table and distinct sort for join. Joins can provide DB2 the opportunity to pick the best table access sequence.S# SP.com . as well as apply predicate transitive closure. the correlated subquery can outperform the equivalent non-correlated subquery if an index on the outer table is not used (DB2 chose a different index based upon other predicates) and one exists in support of the inner table. SP S. but you have choices and should try them out! It should also be noted that as of DB2 9 it is possible to code and ORDER BY and FETCH first in a subquery.P# = ‘P2’ May be best if supporting indexes are available and most rows hook up in the join. This could provide the best existence checking performance in a stand alone query: SELECT 1 INTO :hv-check FROM WHERE FETCH TABLE COL1 = :hv1 FIRST 1 ROW ONLY 110 ca. Also. They may be a benefit as well if the inner query could return a large amount of data if coded as a noncorrelated subquery as long as there is a supporting index for the inner query.S# = SP. such as this: SELECT DISTINCT SNAME FROM WHERE AND S. Also. Joins. which can provide even more options for existence checking in subqueries! For singleton existence checks you can code FETCH FIRST and ORDER BY clauses in a singleton select.

Its effectiveness is controlled by application commits. Lock avoidance also needs frequent commits so that other processes do not have to acquire locks on updated pages. Lock avoidance is critical for performance. data that may eventually be rolled back. Lock avoidance is also generally used with both isolation levels of cursor stability (CS) and repeatable read (RR) for referential integrity constraint checks. then an optimistic locking strategy is your best performing option. Optimistic Locking With high demands for full database availability. reducing database locks is always desired. the oldest begin unit of recovery being required. If you are using WITH UR in an application that will update the data.APPLICATION DESIGN AND TUNING FOR PERFORMANCE Avoiding Read-Only Cursor Ambiguity and Locking Lock avoidance was introduced in DB2 to reduce the overhead of always locking everything. and this also allows for page reorganization to occur to clear the “possibly uncommitted” (PUNC) bit flags in a page. The best way to avoid taking locks in your read-only cursors is to read uncommitted. Although DB2 will consider ambiguous cursors as read-only. Keep in mind however that using WITH UR can result in the reading of uncommitted. it is always best to code you read-only cursors with the clause FOR FETCH ONLY or FOR READ ONLY. but the result is a higher level of DB2 performance and concurrency. We need to bind our programs with CURRENTDATA(NO) and ISOLATION(CS) in DB2 V7 in order to allow for lock avoidance. the timestamp is used to verify that no other application or user has changed the data between the point of the read and the update. For a plan or package bound with RR. In DB2 V8 and DB2 9 CURRENTDATA(YES) can also avoid locks. With this in mind. This will be held in order to guarantee repeatability of the error on checks for updating primary keys for when deleting is restricted. DB2 will check to be sure that a lock is probably necessary for data integrity before acquiring a lock. as well as high transaction rates and levels of concurrency. This update timestamp is read along with all the other data in a row. When a direct update is subsequently performed on the row that was selected. Frequent commits allow the commit log sequence number (CLSN) on a page to be updated more often since it is dependent on the begin unit of recovery in the log. Update timestamps are maintained in all of the data tables. many applications are employing a technique called “optimistic locking” to achieve these higher levels of availability and concurrency. This technique traditionally involves reading data with an uncommitted read or with cursor stability. a page lock is required for the dependent page if a dependent row is found. or dirty. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 111 . Use the WITH UR clause in your statements to avoid taking or waiting on locks. This places additional responsibility on the application to use the timestamp on all updates.

These timestamp columns will be automatically updated by DB2 whenever a row of a table is updated. When a table is created or altered.APPLICATION DESIGN AND TUNING FOR PERFORMANCE Here is a hypothetical example of optimistic locking. This requires that all applications respect the optimistic locking strategy and update timestamp when updating the same table. IBM has introduced built-in support for optimistic locking via the ROW CHANGE TIMESTAMP. UPDATE TABLE1 SET DATA1 = :WS-DATA1. UPDATE_TS = :WS-NEW-UPDATE-TS WHERE KEY1 = :WS-KEY1 AND UPDATE_TS = :WS-UPDATE-TS If the data has changed then the update will get a SQLCODE of 100. This built-in support for optimistic locking takes some of the responsibility (that of updating the timestamp) out of the hands of the various applications that might be updating the data. 112 ca. As of DB2 9. a special column can be created as a row change timestamp. and restart logic will have to be employed for the update.com . DATA1 FROM TABLE1 WHERE KEY1 = :WS-KEY1 WITH UR Here the data is has been changed and the update takes place. First the application reads the data from a table with the intention of subsequently updating: SELECT UPDATE_TS.

999%). we need to have dynamic control of everything.APPLICATION DESIGN AND TUNING FOR PERFORMANCE Here is how the previous example would look when using the ROW CHANGE TIMESTAMP for optimistic locking: SELECT ROW CHANGE TIMESTAMP FOR TABLE1. DATA1 FROM TABLE1 WHERE KEY1 = :WS-KEY1 WITH UR Here the data has been changed and update takes place. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 113 . In order to manage our environments and its objects. UPDATE TABLE1 SET DATA1 = :WS-DATA1 WHERE KEY1 = :WS-KEY1 AND ROW CHANGE TIMSTAMP FOR TABLE1 = :WS-UPDATE-TS Commit Strategies and Heuristic Control Tables Heuristic control tables are implemented to allow great flexibility in controlling concurrency and restartability. our tables have become larger. How many different control tables. Our processing has become more complex. and what indicator columns we put in them will vary depending on the objects and the processing requirements. and our requirements are now 5 9’s availability (99.

run different queries with different access paths due to a resource being taken down. Once the program is running. or go to sleep for a period of time.com . The normal process is for an application to read the control table at the very beginning of the process to get the dynamic parameters and commit time to be used. For example. which would be at process initiation or at a commit point. when it is running during the evening. There can also be an indicator in the table to tell an application that it is time to stop at the next commit point. Information about the status of all processing at the time of the commit is generally stored in the table so that a restart can occur at that point if required. These tables are accessed every time an application starts a unit-of-recovery (unit-of-work). You can also set indicators to tell an application to gracefully shut down.APPLICATION DESIGN AND TUNING FOR PERFORMANCE Heuristic control/restart tables have rows unique to each application process to assist in controlling the commit scope using the number of database updates or time between commits as their primary focus. 114 ca. you would probably want to change the commit scope of a job that is running during the on-line day vs. such as the unavailability of some particular resource for a period of time. Values in these tables can be changed either through SQL in a program or by a production control specialist to be able to dynamically account for the differences in processes through time. The table is then used to store information about the frequency of commits as well as any other dynamic information that is pertinent to the application process. it both updates and reads from the control table at commit time.

CHAPTER 8 Tuning Subsystems There are several areas in the DB2 subsystem that you can examine for performance improvements. When a program accesses a row of a table DB2 places the page containing that row in a buffer. We often trade CPU for I/O in order to manage our buffer pools efficiently. This chapter describes those DB2 components and presents some tuning tips for them. Buffer Pools Buffer pools are areas of virtual storage that temporarily store pages of table spaces or indexes. The data manager issues GETPAGE requests to the buffer manager who hopefully can satisfy the request from the buffer pool instead of having to retrieve the page from disk. We will discuss this when we look at tuning the buffer pool thresholds. Initial buffer pool definitions set at installation/migration but are often hard to configure at this time because the application process against the objects is usually not detailed at installation. but it is tuning these simple operations that can make all the difference in the world to the performance of our applications. but individual buffer pool design and use should be by object granularity and in some cases also by application. When a program changes a row of a table DB2 must write the data in the buffer back to disk (eventually) normally either at a DB2 system checkpoint or a write threshold. The buffer pool definitions are stored in BSDS (Boot Strap Dataset) and we can move objects between buffer pools via an ALTER INDEX/TABLESPACE and a subsequent START/STOP command of the object. Buffer pools are maintained by subsystem. But regardless of what is set at installation we can use ALTER any time after the install to add/delete new buffer pools. These areas are components of DB2 that aid in application processing. We may find we need to do ALTERs of buffer pools a couple times during the day because of varying workload characteristics. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 115 . This improves availability by allowing us to dynamically create new buffer pools when necessary and to also dynamically modify or delete buffer pools. The write thresholds are either a vertical threshold at the page set level or a horizontal threshold at the buffer pool level. resize the buffer pools or change any of the thresholds. The way buffer pools work is fairly simple by design. DB2 buffer pool management by design allows the following: ability to ALTER and DISPLAY buffer pool information dynamically without requiring a bounce of the DB2 subsystem.

This allows for up to 50 4K page buffer pools (BP0 . There is a DSNZPARM called DSVCI that allows the control interval to match to the actual page size. The size of the buffer pools is limited by the physical memory available on your system. 4K Pages 8K Pages 16K Pages 32K Pages 32 Writes per I/O 16 Writes per I/O 8 Writes per I/O 4 Writes per I/O With these new page sizes we can achieve better hit ratios and have less I/O because we can fit more rows on a page. then the system will begin swapping pages from physical memory to disk. If you exceed the available memory in the system.com . You may want to consider decreasing the row size based upon usage to get more rows per page. and are considered ‘dirty pages’ in buffer pool waiting to be externalized There are four page sizes and several bufferpools to support each size: BP0 – BP49 BP8K0 – BP8K9 BP16K0 – BP16K9 BP32K0 – BP32K9 4K pages 8K pages 16K pages 32K pages Work file table space pages are only 4K or 32K.TUNING SUBSYSTEMS Pages There are three types of pages in virtual pools: • Available pages: pages on an available queue (LRU. up to 10 32K page buffer pools (BP32K .BP32K9). Our asynchronous page writes per I/O will change with each page size accordingly. but this count can help determine residency for initial sizing • Updated pages: these pages are not ‘in-use’. MRU) for stealing • In-Use pages: pages currently in use by a process that are not available for stealing. but if an 8K page was used 3 rows could fit on a page. which can have severe performance impacts. with a maximum size for all buffer pools of 1TB. In Use counts do not indicate the size of the buffer pool. 116 ca. For instance if we have a 2200 byte row (maybe for a data warehouse). Virtual Buffer Pools We can have buffer pools up to 80 virtual buffer pools. not available for stealing.BP49). up to 10 8K page buffer pools and up to 10 16K page buffer pools. 1 more than if 4K pages were used and one less lock also if required. we do not want to use these new page sizes as a band-aid for what may be a poor design. However. a 4K page would only be able to hold 1 row. FIFO. It does not take any additional resources to search a large pool versus a small pool.

Pages are externalized to disk when the following occurs: • DWQT threshold reached • VDWQT threshold reached • Dataset is physically closed or switched from R/W to R/O • DB2 takes a checkpoint (LOGLOAD or CHKFREQ is reached) • QUIESCE (WRITE YES) utility is executed • If page is at the top of LRU chain and another update is required of the same page by another process CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 117 . The percentage of each queue in a buffer pool is controlled via the VPSEQT parameter (Sequential Steal Threshold). Asynchronous writes are several pages per I/O for such operations as deferred writes. DB2 breaks up these queues into multiple LRU chains. This would only be used where there is little or no I/O and where table space or index is resident in the buffer pool. We will have separate buffer pools with LRU and FIFO objects and this can be set via the ALTER BUFFERPOOL command with a new PGSTEAL option of FIFO. Stealing of these buffers occurs in a round-robin fashion through the subpools. This decreases the cost of doing a GETPAGE operation and reduces internal latch contention for high concurrency. Batch is usually more sequentially processed. The LRU queue is managed within each of the subpools in order to reduce the buffer pool latch contention when the degree of concurrency is high. These pages are queued separately: LRU — Random Least Recently Used queue or SLRU — Sequential Least Recently Used Queue (Prefetch will only steal from this queue). The way we process our data between our batch and on-line processes often differs. Asynchronous reads are several pages read per I/O for such prefetch operations such as sequential prefetch.TUNING SUBSYSTEMS Buffer Pool Queue Management Pages used in the buffer pools are processed in two categories: Random (pages read one at a time) or Sequential (pages read via prefetch). whereas on-line is processed more randomly. one setting for batch processing and a different setting for on-line processing. FIFO — First-in. LRU is the PGSTEAL option default. Multiple subpools are created for a large virtual buffer pool and the threshold is controlled by DB2. Synchronous writes are pages written one page per I/O. and often requires two settings — for example. first-out can also be used instead of the default of LRU. This way there is less overhead for queue management because the latch that is taken at the head of the queue (actually on the hash control block which keeps the order of the pages on the queue) will be latched less because the queues are smaller. If not. I/O Requests and Externalization Synchronous reads are physical pages that are read in one page per I/O. dynamic prefetch or list prefetch. not to exceed 4000 VBP buffers in each subpool. With this method the oldest pages are moved out regardless. We want to keep synchronous read and writes to only what is truly necessary. DB2 will begin to use synchronous writes if the IWTH threshold (Immediate Write) is reached (more on this threshold later in this chapter) or if 2 system checkpoints pass without a page being written that has been updated and not yet committed. we may begin to see buffer pool stress (maybe too many checkpoints). This becomes a hard threshold to adjust. meaning small in occurrence and number.

or until a -STOP DB2 command is issued. If the write thresholds (DWQT/VDQWT) are doing their job. The CHKFREQ parameter is the number of minutes between checkpoints for a value of 1 to 60. There are two real concerns for how often we take checkpoints: • The cost and disruption of the checkpoints • The restart time for the subsystem after a crash Many times the costs and disruption of DB2 checkpoints are overstated. Checkpoints and Page Externalization DB2 checkpoints are controlled through the DSNZPARM — CHKFREQ.000. SUSPEND causes a system checkpoint to be taken in a non-data sharing environment. such as the defaults. In very general terms during an on-line processing. some users increased the CHKFREQ value and made the checkpoints less frequent. it does not prevent processing from proceeding. However. that a checkpoint every minute did not impact performance or availability. non-disruptive fashion is also helpful for the non-volatile storage in storage controllers. but in effect made them much more disruptive. These are single-subsystem only commands so they will have to be entered for each member when running in a data sharing environment. Also.000. 118 ca.TUNING SUBSYSTEMS We want to control page externalization via our DWQT and VDWQT thresholds for best performance and avoid surges in I/O. then there is less work to perform at each checkpoint.com . Often we may need different settings for this parameter depending on our workload. In trying to control checkpoints. or the number of log records written between DB2 checkpoints for a value of 200 to 16. The default value is 500. but does occur at some installations. The write efficiency at DB2 checkpoints is the key factor needed to be observed to see if CHKFREQ can be reduced. For example we may want it higher during our batch processing. increased response time and I/O spikes. Recognizing the importance of the ability to change this parameter based on workloads. can cause enough I/O to make the checkpoint disruptive. During a checkpoint all updated pages in the buffer pools are externalized to disk and the checkpoint recorded in the log (except for the work files). There have been other options added to the -SET LOG command to be able to SUSPEND and RESUME logging for a DB2 subsystem. this is a hard parameter to set often because it requires a bounce of the DB2 subsystem in order to take effect. All further database updates are prevented until update activity is resumed by issuing a -SET LOG command to RESUME logging. We do not want page externalization to be controlled by DB2 system checkpoints because too many pages would be written to disk at one time causing I/O queuing delays. By obtaining the log-write latch any further log records are prevented from being created and any unwritten log buffers will be written to disk. The situation is corrected by reducing the amount written and increasing the checkpoint frequency which yields much better performance and availability. or some other value based on investigative analysis of the impact on restart time after a failure. DB2 should checkpoint about every 5 to 10 minutes. It is not only possible. While a DB2 checkpoint is a tiny hiccup. The SET LOG CHKTIME command to allows us to dynamically set the CHKFREQ parameter.000. Having a CHKFREQ setting that is too high along with large buffer pools and high thresholds. the BSDS will be updated with the high-written RBA. Also using the write thresholds to cause I/O to be performed in a level.

If the active log data sets are too small then active log switches will occur often. When the threshold is reached. If insufficient real storage exists to back the bufferpool storage requested. then paging can occur. DB2 turns on these write engines. our logs could be driving excessive check point processing resulting in constant writes.TUNING SUBSYSTEMS However. until a 10% reverse threshold is met. Paging can occur when the bufferpool size exceeds the available real memory on the z/OS image. Sizing Buffer pool sizes are determined by the VPSIZE parameter. In order to size bufferpools it is helpful to know the residency rate of the pages for the object(s) in the bufferpool. all pages on the SLRU have to be freed. There is a maximum of 1TB total for all bufferpools (provided the real storage is available). DB2 limits the total amount of storage allocated for bufferpools to approximately twice the amount of real storage (but less is recommended). as well as our checkpoints. This can occur and if it is uncommon then it is okay. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 119 . the SLRU will no longer be valid and the buffer pool is now totally random. However. also known as the Horizontal Deferred Write Threshold. DB2 can handle large bufferpools efficiently. This will also disable prefetch operations in this buffer pool and this is beneficial for certain strategies. The value is 0 to 100% with a default of 80%. but if this occurs daily then there is a tuning opportunity. Running out of write engines can occur if the write thresholds are not set to keep a constant flow of updated pages being written to disk. This parameter needs to be set according to how your objects in that buffer pool are processed. This parameter determines the number of pages to be used for the virtual pool. is the percentage threshold that determines when DB2 starts turning on write engines to begin deferred writes (32 pages/Async I/O). queue at a time. Random Processing The VPSEQT (Virtual Pool Sequential Steal Threshold) is the percentage of the virtual buffer pool that can be used for sequentially accessed pages. The value can be from 0 to 90%. and 20% for random processing. When an active log switch takes place a checkpoint is taken automatically. That would indicate that 80% of the buffer pool is to be set aside for sequential processing. basically one vertical pageset. Writes The DWQT (Deferred Write Threshold). there are problems with this strategy for certain buffer pools and this will be addressed later. Sequential vs. When the VPSEQT is altered to 0. as long as enough real memory is available. This could occur if we do not have our log datasets properly sized. When DB2 runs out of write engines it can be detected in the statistics reports in the WRITE ENGINES NOT AVAILABLE indicator on Statistics report. One tuning option often used is altering the VPSEQT to 0 to set the pool up for just random use. we could still see an unwanted write problem. even if we have our write thresholds (DWQT/VDQWT) set properly. This would prevent us from achieving a high ratio of pages written per I/O because the deferred write queue would not be allowed to fill as it should. This is to prevent sequential data from using all the buffer pool and keep some space available for random processing. Since only the LRU will be used. write engines (up to 600 write engines as of this publication) begin writing pages out to disk. Therefore.

32 pages are still written per I/O. but for a more efficient write process you will want to control writes at the pageset/ partition level. set it to 0. We would use a low value to reduce I/O length for deferred write engines. force-at-commit processing occurs and the updates continually flow to disk instead of surges of writes. such as space map pages. A good rule of thumb for setting the VDWQT is that if less than 10 pages are written per I/O. For LOBs defined with LOG YES. The DWQT threshold works at a buffer pool level for controlling writes of pages to the buffer pools. This can be controlled via the VDWQT (Vertical Deferred Write Threshold). When implementing LOBs (Large Objects). If we choose to set the VDWQT to zero. The value is 0 to 90% with a default of 10%. DB2 could use deferred writes and avoid massive surges at checkpoint.0 and system uses MIN(32. The percentage threshold that determines when DB2 starts turning on write engines and begins the deferred writes for a given data set.com . The DWQT should be set to 0 so that for LOBS with LOG NO. You may also want to set it to 0 to trickle write the data out to disk. but will increase I/O time when deferred write engines begin. You must set the percentage to 0 to use the number specified. but it will take 40 dirty pages (updated pages) to trigger the threshold so that the highly re-referenced updated pages. If we choose to set the DWQT to zero so that all objects defined to the buffer pool are scheduled to be written immediately to disk then DB2 actually uses its own internal calculations for exactly how many changed pages can exist in the buffer pool before it is written to disk. remain in the buffer pool. 120 ca. Set to 0. but it will take 40 dirty pages (updated pages) to trigger the threshold so that the highly re-referenced updated pages. a separate buffer pool should be used and this buffer pool should not be shared (backed by a group buffer pool in a data sharing environment). from 0 to 9999. can be specified for the VDWQT. The VDWQT should always be less than the DWQT. such as space map pages. This threshold should be set based on the referencing of the data by the applications. It is a good idea to set the VDWQT using a number rather than a percentage because if someone increases the buffer pool that means that now more pages for a particular pageset can occupy the buffer pool and this may not always be optimal or what you want. It is normally best to keep this value low in order to prevent heavily updated pagesets from dominating the section of the deferred write area. remain in the buffer pool. Either a percentage of pages or actual number of pages. but this will increase the number of deferred writes.1%) for good for trickle I/O. 32 pages are still written per I/O. This helps to keep a particular pageset/partition from monopolizing the entire buffer pool with its updated pages.TUNING SUBSYSTEMS When setting the DWQT threshold a high value is useful to help improve hit ratio for updated pages.

The default is LRU (Least Recently Used). In affinity data sharing environments this is normally set to 0 to prevent inbound resource consumption of work files and bufferpools. For buffer pools with zero I/O. and the DWQT extremely less (HORIZ. Page buffers are fixed and unfixed in real storage. Page buffers are fixed when they are first used and remain fixed. This can be useful in buffer pools that cannot support parallel operations.TUNING SUBSYSTEMS When looking at any performance report. If this is set to 0 then parallelism is disabled for objects in that particular buffer pool. In these cases. PGFIX(YES) is not recommended. PGFIX(YES) does not provide a performance advantage. Stealing Method The VPSTEAL threshold allows us to choose a queuing method for the buffer pools. where unavailable means either updated or in use by a process.DEFER. There can be no general ratios since that would depend on the both the activity and the number of objects in the buffer pools. Page Fixing You can use the PGFIX keyword with the ALTER BUFFERPOOL command to fix a buffer pool in real storage for an extended period of time. This will also assist in limiting the amount of I/O that checkpoint would have to perform. • PGFIX(NO) The buffer pool is not fixed in real storage for the long term. such as some read-only data or some indexes with a nearly 100% hit ratio. It also defaults to 50% and if it is set to 0. Sysplex Query Parallelism is disabled when originating from the member the pool is allocated to. that is. you would want to see the VDWQT being triggered most of the time (VERTIC. showing the amount of activity for the VDWQT and the DWQT.WRITE THRESHOLD). but FIFO (First In. This option turns off the overhead for maintaining the queue and may be useful for objects that can completely fit in the bufferpool or if the hit ratio is less than 1%. Internal Thresholds The following thresholds are a percent of unavailable pages to total pages.DEFER. allowing for paging to disk. The VPXPSEQT — Virtual Pool Sysplex Parallel Sequential Threshold is a percentage of the VPPSEQT to use for inbound queries. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 121 . with the DWQT watching for and controlling activity across the entire pool and in general writing out rapidly queuing up pages. a high number of pages read or written. The bottom line is that we would want to be controlling I/O by the VDWQT. The value is 0 to 100% with a default of 50%.WRITE THRESHOLD). PGFIX(NO) is the default option. The recommendation is to use PGFIX(YES) for buffer pools with a high I/O rate. Parallelism THE VPPSEQT Virtual Pool Parallel Sequential Threshold is the percentage of VPSEQT setting that can be used for parallel operations. The PGFIX keyword has the following options: • PGFIX(YES) The buffer pool is fixed in real storage for the long term.First Out) is also an option.

This threshold counter can also be incremented if more than two checkpoints occur before an updated page is written since this will cause a synchronous I/O to write out the page. It is not recorded explicitly in a statistic reports. so the application will be waiting while the write occurs (including 100 log writes which must occur first).com . This occurs by a setting GETPAGE/ RELPAGE processing by row instead of page. Another option may be to have more frequent commits in the application programs to free pages in the buffer pool. See the following note. After a GETPAGE and single row is processed then a RELPAGE is issued.TUNING SUBSYSTEMS SPTH THE SPTH Sequential Prefetch Threshold is checked before a prefetch operation is scheduled and during buffer allocation for a previously scheduled prefetch. To eliminate this you may want to increase the size of the buffer pool (VPSIZE). If this threshold is reached then synchronous writes begin and this presents a performance problem. If the SPTH threshold is exceeded prefetch will either not be scheduled or will be canceled. You can observe when this occurs by seeing a non-zero value in the DM THRESHOLD REACHED indicator on a statistics reports. This will cause CPU to become high for objects in that buffer pool and I/O sensitive transaction can suffer. This causes large increases in I/O time.5% of buffers are unavailable (in use). IWTH The Immediate Write Threshold (IWTH) is reached when 97. the IWTH counter can also be incremented when dirty pages are on the write queue have been re-referenced. For example if there are 100 rows in page and if there are 100 updates then 100 synchronous writes will occur. Synchronous writes are not concurrent with SQL. This is checked every time a page is read or updated. disabling sequential prefetch. This value should always be zero. The Buffer Manager will request all threads to release any possible pages immediately. as this will put the pages on the write queues. 122 ca. PREFETCH DISABLED — NO BUFFER (Indicator on Statistics Report) will be incremented every time a virtual buffer pool reaches 90% of active unavailable pages. then it is a clear indication that you are probably experiencing degradation in performance due to all prefetch being disabled. It this value is not 0. If this threshold has been reached then DB2 will access the page in the virtual pool once for every ROW on the page that is retrieved or updated. but DB2 will appear to be hung and you will see synchronous writes begin to occur when this threshold is reached. Be careful with some various monitors that send exception messages to the console when synchronous writes occur and refers to it as IWTH reached — not all synchronous writes are caused by this threshold being reached. This can occur if the buffer pool is too small. one by one for each row. but serial. If this threshold is not reached then DB2 will access the page in the virtual pool once for each page (no matter how many rows used). which has caused a synchronous I/O before the page could be used by the new process. DMTH THE DMTH Data Manager Threshold (also referred to as Buffer Critical Threshold) occurs when 95% of all buffer pages are unavailable (in use). Note: Be aware that if looking at some performance reports. This can lead to serious performance degradation. This is simply being reported incorrectly.

3.e. 2. The incremental detail display shifts the time frame every time a new display is performed. work tables Basic tables Basic indexes Special for large clustered. List.TUNING SUBSYSTEMS Virtual Pool Design Strategies Separate buffer pools should be used based upon their type of usage by the applications (such as buffer pools for objects that are randomly accessed vs. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 123 . Prefetch I/O and Disablement (No buffer. size. No engine). the following steps can be used to help tune buffer pools. MORE DETAILED EXAMPLE OF BUFFER POOL OBJECT BREAKOUTS BP0 BP1 BP2 BP3 BP4 BP5 BP6 BP7 BP8 BP9 BP10 BP11 BP12 Catalog and directory — DB2 only use Work files (Sort) Code and reference tables — heavily accessed Small tables. Use command and view statistics Make changes (i. those that are sequentially accessed). heavily updated — trans tables. Dynamic Requests) Pages Read. 1. object placement) Use command again during processing and view statistics Measure statistics The output contains valuable information such as prefetch information (Sequential. less generic. range scanned table Special for Master Table full index (Random searched table) Special for an entire database for a special application Derived tables and “saved” tables for ad-hoc Staging tables (edit tables for short lived data) Staging indexes (edit tables for short lived data) Vendor tool/utility object Tuning with the -DISPLAY Buffer Pool Command In several cases the buffer pools can be tuned effectively using the DISPLAY BUFFERPOOL command. These are very generic breakouts just for this example. When a tool is not available for tuning. Actual definitions would be much finer tuned. Each one of these buffer pools will have its own unique settings and the type of processing may even differ between the batch cycle and the on-line day. thresholds. 4.

You should have as large a RID pool as required as it is a benefit for processing and can lead to performance degradation if it is too small. and DB2 would not choose access paths that the RID pool supports. but no space is allocated until RID storage is actually needed. The optimizer assumes physical I/O will be less with a large pool. Increasing the size of the RID pool will not help in this case.000 rows would occur. For example. until the maximum size you specified on installation panel DSNTIPC is reached. The RID pool is created at start up time.000.000 rows from 100. Sizing The default size of the RID pool is currently 8 MB with a maximum size of 10000 MB. 124 ca. A good guideline for sizing the RID pool is as follows: Number of concurrent RID processing activities X average number of RIDs x 2 x 5 bytes per RID Statistics to Monitor There are three statistics to monitor for RID pool problems: RIDS OVER THE RDS LIMIT This is the number of times list prefetch is turned off because the RID list built for a single set of index entries is greater that 25% of number of rows in the table. and this would disable the types of operations that use the RID pool.000.000 row table and there is no RID pool available. The RID pool could be set to 0. If this is the case. Run time can result in a table space scan being performed if not enough space is available in the RID. and is controlled by the MAXRBLK installation parameter. if you want to retrieve 10. at any time and without external notification. The full use of RID POOL is possible for any single user at run time. then a scan of 100.TUNING SUBSYSTEMS RID Pool The RID (Row Identifier) pool is used for storing and sorting RIDs for operations such as: • List Prefetch • Multiple Index Access • Hybrid Joins • Enforcing unique keys while updating multiple rows The RID pool is looked at by the optimizer for prefetch and RID use. This is an application issue for access paths and needs to be evaluated for queries using list prefetch. which may or may not be good depending on the size of the table accessed.com . It is then allocated in 32KB blocks as needed. There are a few guidelines for setting the RID pool size. DB2 determines that instead of using list prefetch to satisfy a query it would be more efficient to perform a table space scan.

If buffer pools and work file database are not used then the better performance will be due to less I/O. Size The sort pool size defaults to 2MB unless specified. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 125 . When work file database is used for holding the pages that make up the sort runs. DB2 uses a special sorting technique called a tournament sort. If the underlying tables are growing. therefore it may no longer match the real 25% value. you could experience performance degradation if the pages get externalized to the physical work files since they will have to be read back in later in order to complete the sort. The 25% threshold is actually stored in the package/plan at bind time. and on what tables. which are intermediate sets of ordered data. During the sorting processes it is not uncommon for this algorithm to produce logical work files called runs. More often than not the sort cannot complete in the sort pool and the runs are moved into the work file database. and in fact could be far less. Key correlation statistics and better information about skewed distribution of data can also help to gather better statistics for access path selection and may help avoid this problem. If the sort pool is large enough then the sort completes in that area. We want to size sort pool and work file database large because we do not want sorts to have pages being written to disk. then rebinding the packages/plans that are dependent on it should be rebound after a RUNSTATS utility has updated the statistics. the fewer sort runs are produced. The consequences of hitting this limit can be fallback to a table space scan. It is important to know what packages/plans are using list prefetch.TUNING SUBSYSTEMS There is one very critical issue regarding this type of failure. The larger the Sort Pool (Sort Work Area) is. The SORT Pool Sorts are performed in 2 phases: • Initialization – DB2 builds ordered sets of ‘runs’ from the given input • Merge – DB2 will merge the ‘runs’ together DB2 allocates at startup a sort pool in the private area of the DBM1 address space. RIDS OVER THE DM LIMIT This occurs when over 28 million RIDS were required to satisfy a query. If the sort pool is large enough then the buffer pools and sort work files may not be used. It can range in size from 240KB to 128MB and is set with an installation DSNZPARM. especially if there are many rows to sort. Currently there is a 28 million RID limit in DB2. you have a couple of options: • Fix the index by doing something creative • Add an additional index better suited for filtering • Force list prefetch off and use another index • Rewrite the query • Maybe it just requires a table space scan INSUFFICIENT POOL SIZE This indicates that the RID pool is too small. These runs are later merged to complete the sort. In order to control this.

SKPTs. By correctly sizing the EDM pools you can avoid unnecessary I/Os from accumulating for a transaction. If a SKCT. and each contains many items including the following: • EDM Pool – CTs – cursor tables (copies of the SKCTs) – PTs – package tables (copies of the SKPTs) – Authorization cache block for each plan – Except those with CACHESIZE set to 0 • EDM Skeleton Pool – SKCTs – skeleton cursor tables – SKPTs – skeleton package tables • EDM DBD cache – DBDs – database descriptors • EDM Statement Cache – Skeletons of dynamic SQL for CACHE DYNAMIC SQL Sizing If the pool is too small. and the least recently used pages get stolen if required.TUNING SUBSYSTEMS The EDM Pool The EDM pool (Environmental Descriptor Manager) is made up of three components. and re-preparing the dynamic SQL statements because they could not remained cached. Pages in the pool are maintained on an LRU queue. then you will see increased I/O activity in the following DB2 table spaces. which support the DB2 directory: DSNDB01. each of which is in its own separate storage.SCT02 Our main goal for the EDM pool is to limit the I/O against the directory and catalog. 126 ca. SKPT or DBD has to be reloaded into the EDM pool this is additional I/O. and DBDs.DBD01 DSNDB01.com . If the pool is too small. This can happen if the pool pages are stolen because the EDM pool is too small.SYSPACKAGES to give you an idea of the number of packages that may have to exist in the EDM pool and this can help determine the size. If a new application is migrating to the environment it may be helpful to look in SYSIBM.SPT01 DSNDB01. A DB2 performance monitor statistics report can be used to track the statistics concerning the use of the EDM pools. then you will also see increased response times due to the loading of the SKCTs.

Logging Every system has some component that will eventually become the final bottleneck. Logging can be tuned and refined. it will have to be prepared again. and the statement is reused. The results of this statement are placed in the DSN_STATEMENT_CACHE_TABLE described earlier in this chapter. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 127 . Think of these as EDM pool hit ratios: • CT requests versus CTs not in EDM pool • PT requests versus PTs not in EDM pool • DBD requests versus DBDs not in EDM pool What you want is a value of 5 for each of the above (1 out of 5). Logging is not to be overlooked when trying to get transactions through the systems in a high performance environment. Cached statements are not backed by disk and if its pages are stolen. There are statistics to help monitor cache use and trace fields show effectiveness of cache. an application can also cache statements at the thread level via the KEEPDYNAMIC(YES) bind parameter in combination with not re-preparing the statements. In addition. Static plans and packages can be flushed from EDM by LRU but are backed by disk and can be retrieved when used again. but the synchronous I/O associated with logging and commits will always be there. An 80% hit ratio is what you are aiming for. The MAXKEEPD subsystem parameter can be used to limit the amount of thread storage consumed by applications caching dynamic SQL at the thread level. In these situations the statements are cached at the thread level in thread storage as well as at the global level. the dynamic statement cache can be “snapped” via the EXPLAIN STMTCACHE ALL Explain statement. Dynamic SQL Caching If we are going to use dynamic SQL caching we are going to have to pay attention to our EDM statement cache pool size. and can be seen on the Statistics Long Report. In addition to the global dynamic statement caching in a subsystem.TUNING SUBSYSTEMS Efficiency We can measure the following ratios to help us determine if our EDM pool is efficient. As long as there is not a shortage of virtual storage the local application thread level cache is the most efficient storage for the prepared statements.

If it is not in the output buffer then DB2 will look for it in the active log data set and then the archive log data set. If this happens it can be observed in the statistics report when the UNAVAILABLE ACTIVE LOG BUFF has a non-zero value. you want to minimize: SUSPENSION An application process is suspended when it requests a lock that is already held by another application process and cannot be shared. If it finds the record there it can apply it directly from the output buffer. Log Writes Applications move log records to the log output buffer using two methods — no wait or force. however if there are no output buffers available the application will wait. restarts and rollbacks — processes that you do not want taking forever. The ‘no wait’ method moves the log record to the output buffer and returns control to the application.com . the processing time will be extended. Successful moves without a wait are recorded in the statistics report under NO WAIT requests. Providing a large buffer will improve performance for log reads and writes. For this reason it is important to have large output buffers and active logs. If the record has to be read in from the archive log. These reads are the better performers. This means that DB2 had to wait to externalize log records due to the fact that there were no available output log buffers. An input buffer will have to be dedicated for every process requesting a log read. To know how often this happens you can look at the WRITE OUTPUT LOG BUFFERS in the statistics report. You must balance the need for concurrency with the need for performance. 128 ca. which is performed by changing an installation parameter (OUTBUFF). The suspended process temporarily stops running. In order to improve the performance of the log writes there are a few options. A force occurs at commit time and the application will wait during this process. If at all possible. DB2 will first look for the record in the log output buffer. Locking and Contention DB2 uses locking to ensure consistent data based on user requirements and to avoid losing data updates. You would want to increase this if you are seeing that there are unavailable buffers. Log records are then written from the output buffers to the active log datasets on disk either synchronously or asynchronously. You can monitor the successes of reads from the output buffers and active logs in the statistics report. which is considered a synchronous write. When it is found the record is moved to the input buffer so it can be read by the requesting process.TUNING SUBSYSTEMS Log Reads When DB2 needs to read from the log it is important that the reads perform well because reads are normally performed during recovery. First we can increase the number of output buffers available for writing active log datasets.

You can use a DB2 performance trace record with IFCID 0073 to retrieve information about how often thread create requests wait for available threads. to limit the data collected. The MAX USERS field on DSNTIPE specifies the maximum number of allied threads that can be allocated concurrently. DEADLOCK A deadlock occurs when two or more application processes hold locks on resources that the others need and without which they cannot proceed. and tasks using the call attachment facility. batch jobs. and determine main storage size needed. and the method of data access. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 129 . performance degradation occurs because users are waiting for available threads. DB2 terminates the process and returns a -911 or -913 SQL code. This report outlines all the resources and agents involved in a deadlock and the significant locking parameters. It depends on the type of processing being done. such as lock state and duration. so be sure to qualify the trace with specific IFCIDS. and other qualifiers. including the holder(s) of the unavailable resources The way DB2 issues locks is complex.TUNING SUBSYSTEMS TIME OUT An application process is said to time out when it is terminated because it has been suspended for longer than a preset interval. To monitor DB2 locking problems: • Use the -DISPLAY DATABASE command to find out what locks are held or waiting at any moment • Use EXPLAIN to monitor the locks required by a particular SQL statement. Thread Management The thread management parameters on DB2 install panel DSNTIPE control how many threads can be connected to DB2. IMS. storage is wasted. related to their requests • Activate a DB2 statistics class 3 trace with IFCID 196. Starting a performance trace can involve substantial overhead. or all the SQL in a particular plan or package • Activate a DB2 statistics class 3 trace with IFCID 172. the isolation level of the plan or package being executed. Improper allocation of the parameters on this panel directly affect main storage usage. CICS. If the allocation is too high. When the number of users trying to access DB2 exceeds your maximum. plan allocation requests are queued. These threads include TSO users. This report shows detailed information on timed-out requests. The maximum number of threads that can be accessing data concurrently is the sum of this value and the MAX REMOTE ACTIVE specification. the LOCKSIZE parameter specified when the table was created. If the allocation is too low.

Also. If you have indexes on the DB2 catalog. 130 ca. make sure the volumes are completely dedicated to the catalog and directory for best performance. You can reorg the catalog and directory. You should periodically run RUNSTATs on the catalog and analyze the appropriate statistics that let you know when you need to reorg.com . place them on a separate volume as well. Even better.TUNING SUBSYSTEMS The DB2 Catalog and Directory Make sure the DB2 catalog and directory are on separate volumes. consider isolating the DB2 catalog and directory into their own buffer pool.

but rather they are written with a generic set of standardized SQL statements that can work on a variety of database management systems and platforms. Finally. with that in mind the majority of your focus should be on first getting the database organized in a manner that best accommodates the SQL statements. Database Utilization and Packaged Applications You should keep in mind that these ERP applications are intentionally generic by design. With this great flexibility also comes the potential for reduced database performance. So. and mismatching of predicates to indexes. or the SQL statements issued. These packaged applications are also typically written using an object oriented design. Sometimes you can get a vendor to change a SQL statement based upon information you provided about a performance problem. and the flexibility of the implementation could have an impact on table access sequence. which means that they are not written to a specific database vendor or platform. So. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 131 . inventory management. these organizations are relying on “off the shelf” packaged software solutions to help run their applications and automate some of their business activities such as accounting. but this is typically not the case. table or index designs. random data access. This chapter will offer some tips as to how to manage and tune your DB2 database for a higher level of performance with these ERP applications. They are also typically database agnostic. In many situations. and human resources. and are created in such a way that they can be customized by the organization implementing the software. therefore. This provides a great flexibility in that many of the tables in the ERP applications’ database can be used in a variety of different ways. OO design may lead to an increase in SQL statements issued. when you purchase an application you very typically have no access to the source code. Of course. customer management. and then tune the subsystem to provide the highest level of throughput. billing. you should assume that your ERP application is taking advantage of few if any of the DB2 SQL performance features. and for a variety of purposes. Most of the time these packaged application are referred to as enterprise resource planning (ERP) applications. even though you can’t change the SQL statements you should first be looking at those statements for potential performance problems that can be improved by changing the database or subsystem. or specific database performance features.CHAPTER 9 Understanding and Tuning Your Packaged Applications Many organizations today do not retain the manpower resources to build and maintain their own custom applications.

Even though you won’t be able to change the SQL statements. and not negatively impact performance for other customers. DB2 V8. or by running a trace (DB2 V7. memory. For example. This will involve subsystem and/or database tuning. using EXPLAIN STMTCACHE ALL statement will only capture the statements that are residing in the dynamic statement cache at the time the statement is executed. if you make changes to the database.SYSPACKSTMT table by package for statements in a package.SYSSTMT table by plan for statements in a plan. you can change subsystem parameters. Whether or not the application is using static or dynamic SQL it’s best to capture all of the SQL statements being issued. Of course if you run a trace you can expect to have to parse through a significant amount of data. then you need to consider the fact that future product upgrades or releases can undo the changes you have done. DB2 9).UNDERSTANDING AND TUNING YOUR PACKAGED APPLICATIONS Finding and Fixing SQL Performance Problems The first thing that needs to be done when working with a packaged application is to determine where the performance problems exist. However. However. You can query these tables using the plan or package names corresponding to the application. You need to make a decision at this point as subsystem tuning really has no impact on the application and database design. and even change the tables and indexes for performance. then you can begin your tuning effort. if your packaged application is utilizing dynamic SQL then this technique will be less effective. and cataloging them in some sort of document or perhaps even in a DB2 table. as this will require getting the vendor involved and having them make a change that will improve a query for you. Once you’ve determined the packages and/or statements that are consuming the most resources. or the SYSIBM. It is most likely that you will not be able to change the SQL statements. This can be achieved by querying SYSIBM. In that case you are better off utilizing one or more of the techniques outlined in Chapter 3 “Recommendations for Distributed Dynamic SQL”. The best way to find the programs and SQL statements that have the potential for elapsed and CPU time savings is to utilize the technique described in Chapter 6 called “Overall Application Performance Monitoring”. or have the highest elapsed time. or if you are tuning to save CPU dollars. 132 ca. If the application is utilizing dynamic SQL then you can capture the SQL statements by utilizing EXPLAIN STMTCACHE ALL (DB2 V8 or DB2 9). Keep in mind that by capturing these dynamic statements you are only seeing the statements that have been executing during the time you’ve monitored. Before you begin your tuning effort you have to decide if you are tuning in order to improve the response time for end users. This would be the technique for packaged applications that are using static embedded SQL statements.com .

The impact of the changes can be immediate. • RID Pool Make sure to look at your statistics report to see how much the RID pool is utilized. the DB2 System Catalog should be in its own buffer pool. Consider first organizing you buffers according to the general recommendations in Chapter 8 of this guide. You could also identify the most commonly accessed page sets and separate them into their own buffer pool. In addition. or perhaps increasing memory. and get them all in their own buffer pool with enough memory to make them completely memory resident. and monitor the buffer hit ratio for improvement. and you’ll want to minimize that. • Workfile Database and Sort Pool Your packaged application is probably doing a lot of sorting. You can monitor the statement cache hit ratio by looking at a statistics report. You can use the –DISPLAY BUFFERPOOL command to get information about which page sets are accessed randomly versus sequentially. MAX_OPT_STOR. you should have several workfile table spaces created in order to avoid resource contention between concurrent processes. and if there are RID failures due to lack of storage. and MXQBCE (DB2 V8. You can identify the code tables. This could involve simply changing some subsystem parameters. If so. If the application is using dynamic SQL then this pool should be large enough to avoid I/O for statement binds. MAX_OPT_ELAP. Continue to increase the memory until you get a diminishing return on the improved hit ratio or you run out of available real memory. Once you’ve made these changes you can increase the amount of memory in the most highly accessed buffer pools (especially for indexes). and given enough memory to avoid binding on PREPARE. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 133 . then you most certainly should make sure that the dynamic statement cache is enabled. • Dynamic Statement Cache Is the application using dynamic SQL? If so. you should increase the size of the RID pool. If the statement prepare time is a big factor in query performance you could consider adjusting some DB2 installation parameters that control the bind time. DB2 9) are what is known as “hidden” installation parameters and will help control statement prepare costs. These parameters. but most certainly should be adjusted only under the advice of IBM. • DB2 Catalog If it isn’t already. You can use this information to further separate objects based upon their access patterns. Then you can take a closer look at the access to individual page sets within the pools. MAX_OPT_CPU. Statement prepare time can be a significant contributor to overall response time. Please refer to Chapter 8 on subsystem tuning for the full details of tuning such things as: • Buffer Pools Buffer tuning can be a major contributor to performance with packaged applications.UNDERSTANDING AND TUNING YOUR PACKAGED APPLICATIONS Subsystem Tuning for Packaged Applications The easiest way to tune packaged applications is by performing subsystem tuning. and in situations in which the subsystem is poorly configured the impact can be dramatic. If this is the case then you should make sure that your sort pool is properly sized to deal with lots of smaller sorts.

and you’ll need to balance the number of threads with the amount of virtual storage consumed by the DBM1 address space. accounting report. If tables are partitioned for a certain reason you could consider clustering within each partition (DB2 V8.UNDERSTANDING AND TUNING YOUR PACKAGED APPLICATIONS • Memory and DSMAX These packaged applications typically have a lot of objects. DB2 9) to keep the separation of data by partition. Finding the packages and statement that present themselves as performance problems can then lead you to the most commonly accessed tables. Make sure your buffer pools are page fixed. You can also change indexes that contain variable length columns to use the NOT PADDED (DB2 V8. but improve the access within each partition. Are there a lot of random pages accessed? Are matching columns of an index scan not matching on all available columns? Are there more columns in a WHERE clause then in the index? Does it look like there is repetitive table access? Are the transaction queries using list prefetch? If you are finding issues like these you can take action. and some utilities. deletes. You can monitor to see which objects are most commonly accessed. but you can look at joins that occur commonly and consider changing the clustering of those tables so that they match. You can consider using the KEEPDYNAMIC bind option for the packages. but keep in mind that this can increase thread storage. That’s why having a catalog of all the statements is important. You can also make sure your output log buffer is large enough to avoid shortages if the application is changing data quite often.com . Understanding how the application accesses the tables is important. and make those objects CLOSE NO. Please refer to Chapter 8 of this guide for more subsystem tuning options. You can use the techniques outlined in Chapters 3 and 6 of this guide to do that. and if you have enough storage make all of the objects CLOSE NO. there are certain areas that you can focus on for your packaged applications. You can also refer to the IBM redbook entitled “DB2 UDB for z/OS V8: Through the Looking Glass and What SAP Found There” for additional tips on performance. This means that the generic indexes should be adjusted to fit your needs. EXPLAIN) can give you clues as to potential changes for performance. However. but your use of them will be customized for you needs. However. statistics report. Table and Index Design Options for High Performance Chapter 4 of this guide should be reviewed for general table and index design for performance. performance trace. 134 ca. you should be aware of the impact that adding indexes will have on inserts. The easiest thing to do is add indexes in support of poor performing queries. • Clustering Since the packaged application can anticipate your most common access paths you could have a problem with clustering. Look for page sets that have a very high percentage of randomly accessed pages in the buffer pool as a potential opportunity to change clustering. The first thing you have to do is find the portions of the application that have the biggest impact on performance. This can have a dramatic impact on the performance of these joins. there are two major things you can change: • Indexes These packaged applications are generic in design. In general. as well as adding columns to indexes to increase matchcols. DB2 9) option to improve performance of those indexes. Other things you can do is to change the order of index columns to better match your most common queries in order to increase matchcols. possibly updates. Taking a look at how these tables are accessed by the programs (DISPLAY BUFFERPOOL command. but you have to make sure that no other processes are negatively impacted. Make sure your DSMAX installation parameter is large enough to avoid any closing of datasets.

you should document every change completely! Monitoring and Maintaining Your Objects The continual health of your objects is important. to make decisions for object maintenance. This involves regularly running the RUNSTATS utility on these frequently changing indexes and table spaces. A new set of DB2 objects must be created in order to allow for DB2 to write out the statistics. a DB2 supplied stored procedure or Control Center. The statistics can be used by user written queries/programs. Make sure to set up a regular strategy of RUNSTATS. Performance can degrade quickly as these objects become disorganized.TABLESPACESTATS • SYSIBM. Proper PCTFREE is very important for indexes where inserts are common. Proper PCTFREE for table space can help avoid expensive exhaustive searches for inserts. which must be started in order to externalize the statistics that are being held in virtual storage. or one row per partition. There are two tables (with appropriate indexes) that must be created to hold the statistics: • SYSIBM. SDSNSAMP(DSNTESS) contains the information necessary to set up these objects. DB2 will then populate the tables with one row per table space or index space. Therefore. Understanding the quantity of inserts to a table space can also guide you to adjusting the free space. REORGS. especially those that are changed often. The statistics are kept in virtual storage and are calculated and updated asynchronously upon externalization. Keep a list of commonly accessed objects. each member will write its own statistics to the RTS tables. In order to externalize them the environment must be properly set up. You don’t want these applications to be splitting pages so set the PCTFREE high. and begin a policy of frequent monitoring of these objects. and REBINDs (for static SQL) to maintain performance. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 135 . especially if there is little or no sequential scanning of the index. For tables that are shared in a data-sharing environment. Real time statistics allows DB2 to collects statistics on table spaces and index spaces and then periodically write this information to two userdefined tables (as of DB2 9 these tables are an integral part of the system catalog). Real Time Statistics You can utilize DB2’s ability to collect statistics in real time to help you monitor the activity against you packaged application objects. STATISTICS COLLECTION DB2 is always collecting statistics for database objects.UNDERSTANDING AND TUNING YOUR PACKAGED APPLICATIONS Of course you need to always be aware of the fact that any change you make to the database can be undone on the next release of the software.INDEXSPACESTATS These tables are kept in a database named DSNRTSDB.

For example. or deletes (singleton or mass) since the last REORG or LOAD REPLACE. These include: number of inserts/updates/deletes (singleton and mass) since the last RUNSTATS execution. Those processes include: SQL. number of inserts. batch inserts) have occurred so we can take appropriate actions if necessary. There are several processes that will have an effect on the real time statistics. Statistics that help to determine when a REORG is needed include: time when the last REBUILD. number of active pages. There are also statistics gathered on indexes. or RUNSTATS execution. There are also statistics regarding the number of updates/deletes (real or pseudo. and time of last COPY. Statistics collected to help with COPY determination include: distinct updated pages and changes since the last COPY execution and the RBA/LRSN of first update since last COPY. SELECT NAME FROM SYSIBM. number or levels. Utilities and the dropping/creating of objects. space allocated and extents. number of unclustered inserts. Some statistics that may help determine when a REORG is needed include: space allocated. extents. then it may be time to execute RUNSTATS.com . If the date of the last REORG is more recent than the last RUNSTATS.SYSTABLESPACESTATS WHERE DBNAME = ‘DB1’ and ((COPYUPDATEDPAGES*100)/NACTIVE)>30 This table can be used to compare the last RUNSTATS timestamp to the timestamp of the last REORG on the same object to determine when RUNSTATS is needed. 136 ca.e. queries can then be written against the tables. number of disorganized LOBs. number of overflow records created since last REORG. EXTERNALIZING AND USING REAL TIME STATISTICS There are different events that can trigger the externalization of the statistics. These statistics are of course very helpful for determining how our data physically looks after certain process (i. DSNZPARM STATSINST (default 30 minutes) is used to control the externalization of the statistics at a subsystem level. singleton or mass)/inserts (random and those that were after the highest key) since the last REORG or REBUILD. REORG. updates.UNDERSTANDING AND TUNING YOUR PACKAGED APPLICATIONS Some of the important statistics that are collected for table spaces include: total number of rows. Basic index statistics include: total number of entries (unique or duplicate). REORG or LOAD REPLACE occurred. Once externalized. number of active pages. There are also statistics to help for determining when RUNSTATS should be executed. a query against the TABLESPACESTATS table can be written to identify when a table space needs to be copied due to the fact that greater than 30 percent of the pages have changed since the last image copy was taken.

REORG. is a sample procedure which will query the RTS tables and determine which objects need to be reorganized. DSNACCOR.UNDERSTANDING AND TUNING YOUR PACKAGED APPLICATIONS SELECT NAME FROM SYSIBM. Ideally. updated with current statistics. SELECT NAME FROM SYSIBM.SYSTABLESPACESTATS WHERE DBNAME = ‘DB1’ and ((REORGUNCLUSTINS*100)/TOTALROWS)>10 There is also a DB2 supplied stored procedure to help with this process. This stored procedure.SYSTABLESPACESTATS WHERE DBNAME = ‘DB1’ and (JULIAN_DAY(REORGLASTTIME)>JULIAN_DAY(STATSLASTTIME)) This last example may be useful if you want to monitor the number of records that were inserted since the last REORG or LOAD REPLACE that are not well-clustered with respect to the clustering index. RESTRICT) or for one or more of your choice and for specific object types (table spaces and/or indexes). image copied. and possibly even work toward automating the whole determination/utility execution process. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 137 . DSNACCOR creates and uses its own declared temporary tables and must run in a WLM address space. The SYSTABLESPACESTATS table value REORGUNCLUSTINS can be used to determine whether you need to run REORG after a series of inserts. have taken too many extents. EXTENTS. and those which may be in a restricted status. DSNACCOR can make recommendations for everything (COPY. RUNSTATS. ‘well-clustered’ means the record was inserted into a page that was within 16 pages of the ideal candidate page (determined by the clustering index). The output of the stored procedure provides recommendations by using a predetermined set of criteria in formulas that use the RTS and user input for their calculations.

Page intentionally left blank .

but should be carefully documented in your program and/or statement. which can cause queries to become very expensive. A GROUP BY all columns can utilize non-unique indexes possibly avoid a sort. causing DB2 to take share locks on pages and hold those locks. avoid using DISTINCT more than once in a query. no matter where the SELECT is in a statement. the question to first ask is ‘Are duplicates even possible?’ If the answer is no. It’s always good to have a few ideas or tricks up your sleeve. Sometimes code generators create a DISTINCT after every SELECT clause. This is rarely ever necessary. or do you need an answer to a performance problem. When considering the usage of DISTINCT. If duplicates are possible and not desired then use DISTINCT wisely. Coding a GROUP BY all columns can be more efficient than using DISTINCT. Code DISTINCT Only When Needed The use of DISTINCT can cause excessive sorting. RS (Read Stability) is basically RR (Repeatable Read) with inserts allowed.CHAPTER 10 Tuning Tips Back against a wall? Not sure why DB2 is behaving in a certain way. Try to use a unique index to help avoid a sort. RS tells DB2 that when you read a page you intend to read it or update it again later. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 139 . and can definitely cause increased contention. the use of DISTINCT in a table expression forces materialization. Here are some DB2 hints and tips in no particular order. Also. Also. and you want no one else to update that page until you commit. Consider Using DISTINCT as early as possible in complex queries in order to get rid of the duplicates as early as possible. Only a unique index can help to avoid sorting for a DISTINCT. Also we have seen some programmers coding DISTINCTs just as a safeguard. We often see DISTINCT coded when it is not necessary and sometimes it comes from a lack of understanding of the data and/or the process. Don’t Let Java in Websphere Default the Isolation Level ISOLATION(RS) is the default for Websphere applications. then remove the DISTINCT and avoid the potential sort.

updating the most recent history or audit row). There is a zparm RRULOCK=YES option which acquires U. this was the case if the tablespace was defined with LOCKPART(YES). When it finds data to update then it will change the S lock to an X lock. This could help to avoid deadlocks for some update statements.com . or when the DB2 client software is upgraded it may result in a new RS package (or a rebind of the package). When an isolation level of RS is used. 4.g. This is the best option. Have the application change the statement to add WITH CS. This is a dirty solution because there may be applications that require RS. Change the application server connect to use an isolation level of CS rather than RS. Locking a Table in Exclusive Mode Will NOT Always Prevent Applications from Reading It From some recent testing both in V7 and V8 we found that issuing the LOCK TABLE <table name> IN EXCLUSIVE MODE would not prevent another application from reading the table either with UR or CS. 1. Once an update was issued then the table was not readable. As the transaction volume increases these deadlocks are more likely to occur. Rebind the isolation RS package being used. and the V8 test results were the same because now LOCKPART(YES) is the default for the tablespace. Another problem that has been experienced is that searched updates can deadlock under isolation RS. This can be done by setting a JDBC database connection property. but often the most difficult to get implemented. This could be exaggerated by a self-referencing correlated subquery in the update (e. So how do you control this situation? Well some possible solutions include the following. and then go back to do the update in RID sequence. This is because in that situation DB2 will read the data with an S lock. In V7. These problems have been experienced with applications that are running under Websphere and allowing it to determine the isolation level. which defaults to RS. and potential share lock escalations because DB2 may escalate from an S lock on the page to an S lock on the tablespace. versus S locks for update/delete with ISO(RS) or ISO(RR). but for DB2 applications this is wasteful and should be changed. put the results in a workfile with the RIDs. the search operation for a searched update will read the page with an S lock. 140 ca. This is working as designed and the reason for this tip is to make sure that applications are aware that just issuing this statement does NOT immediately make a table unreadable. with an isolation level of CS. 2. Through researching this issue we have found no concrete reason why this is the default in Websphere. and we'll have to track that and perform a special rebind with every upgrade on every client. such as the inability to get lock avoidance. 3.TUNING TIPS Using Read Stability can lead to other known problems.

and will only contain data when we want the application to behave in an unusual way. Simply write a recursive common table expression that generates a number of rows equivalent to the number of sequence values you want in a batch. Use Recursive SQL to Assign Multiple Sequence Numbers in Batch In some situations you may have a process that adds data to a database. You can do this by using recursive SQL. but this process is utilized by both online and batch applications. but can get expensive in batch. Using the appropriate settings for block fetching in distributed remote.TUNING TIPS Put Some Data in Your Empty Control Table Do you have a table that is used for application control? Is this table heavily accessed. Many times these control tables are empty. Instead of creating an object with a large increment value why not keep an increment of 1 for online. CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 141 . If this is the case then you should see if the application can tolerate some fake data in the table. and grab a “batch” of sequence numbers in batch. This can add a significant number of getpage operations for a very busy application.SYSDUMMY1 UNION ALL SELECT C+1 FROM WHERE GET_THOUSAND C < 1000) SELECT NEXT VALUE FOR SEQ1 FROM GET_THOUSAND. In testing with these tables we discovered that you do not get index lookaside on an empty index. If you are using sequence objects to assign system generated key values this works quite well for online. If so you’ll then get index lookaside and save a little CPU. you can significantly reduce the cost of getting these sequence values for a batch process. Here’s what a SQL statement would look like for 1000 sequence values: WITH GET_THOUSAND (C) AS (SELECT 1 FROM SYSIBM. but usually has no data in it? We often use tables to control application behavior with regards to availability or special processing. and then as you select from the common table expression you can get the sequence values. or by using multi-row fetch in local batch.

CUST_NME. A. HIST_DTE. ACCT_BAL) AS (SELECT ACCT_ID. Consider the following view and query: CREATE VIEW HIST_VIEW (ACCT_ID.ACCT_ID = B. MAX(ACCT_BAL) FROM WHERE ACCT_HIST X X. HIST_DTE. Consider instead the following query which will strongly encourage DB2 to use the index on the ACCT_ID column of the ACCT_HIST table: SELECT A.HIST_DTE.ACCT_ID WHERE A.ACCT_BAL FROM ACCOUNT A LEFT OUTER JOIN HIST_VIEW B ON A. especially for transaction environments when you transactions are processing little or not data. B.com . HIST_DTE) AS B ON A. B. MAX(ACCT_BAL) FROM ACCT_HIST GROUP BY ACCT_ID.ACCT_ID.ACCT_BAL FROM ACCOUNT A LEFT OUTER JOIN TABLE(SELECT ACCT_ID. 142 ca.ACCT_ID = 1110087. B. If this is happening it is not likely that you’ll be able to get the performance desired.ACCT_ID = 1110087.ACCT_ID GROUP BY ACCT_ID. A.ACCT_ID.ACCT_ID = B. B.ACCT_ID = A.CUST_NME. HIST_DTE. SELECT A. HIST_DTE).ACCT_ID WHERE A.TUNING TIPS Code Correlated Nested Table Expression versus Left Joins to Views In some situations DB2 may decide to materialize a view when you are left joining to that view.HIST_DTE.

For multi-row fetch you can use the SQLERRD3 field to determine how many rows were retrieved when you get a SQLCODE 100. Most of the recommendations must be considered during design before the tables and table spaces are physically implemented because to change after the data is in use would be difficult. If you get a non-negative SQLCODE from a multi-row operation then you can use the GET DIAGNOSTICS statement to identify the details of the failure. Instead use the SQLCA first. as well as an elapsed time saver for sequential applications. So. but can apply to many situations. not simple – Will keep rows of different tables on different pages. especially in a data sharing environment • Use partitioning where possible – Can reduce contention and increase parallel activity for batch processes – Can reduce overhead in data sharing by allowing for the use of affinity routing to different partitions – Locks can be taken on individual partitions • Use Data Partitioned Secondary Indexes (DPSI) – Promotes partition independence and less contention for utilities • Consider using LOCKMAX 0 to turn off lock escalation – In some high volume environments this may be necessary. NUMLKTS will need to be increased and the applications must commit frequently CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 143 . so page locks only lock rows for a single table • Use LOCKSIZE parameters appropriately – Keep the amount of data locked at a minimum unless the application requires exclusive access • Consider spacing out rows for small tables with heavy concurrent access by using MAXROWS =1 – Row level lock could also be used to help with this but the overhead is greater. • Use segmented or universal table spaces. so you never need to use a GET DIAGNOSTICS for multi-row fetch. you don’t want to issue a GET DIAGNOSTICS after every multi-row statement. Give it a try! Use GET DIAGNOSTICS Only When Necessary for Multi-Row Operations Multi-row operations can be a huge CPU saver. However.TUNING TIPS This recommendation is not limited to aggregate queries in views. Database Design Tips for High Concurrency In order to design high concurrency in a database there are a few tips to follow. the GET DIAGNOSTICS statement is one of the most expensive statements you can use in an application at about three times the cost of the other statements.

The following are some tips for properly designing database for data sharing use: • Proper Partitioning Strategies – Advantages – Processing by key range – Reducing or avoiding physical contention – Reducing or avoiding logical contention – Parallel processes are more effective • Appropriate clustering and partitioning allows for better cloning of applications across members – Access is predetermined and sequential – Key ranges can be divided among members – Can better spread workload – Reduce contention and I/O – Can run utilities in parallel across members 144 ca.TUNING TIPS • Use volatile tables – Reduces contention because an index will also be used to access the data by different applications that always access the data in the same order • Have an adequate number databases – Reduce DBD locking if DDL. The best performing databases in data sharing are those that are properly designed. DCL and utility execution is high for objects in the same database • Use sequence objects – Will provide for better number generation without the overhead of using a single control table Database Design Tips for Data Sharing In DB2 data sharing the key to achieving high performance and throughput is to minimize the amount of actual sharing across members and to design databases/applications to take advantage of DB2 and data sharing.com .

there are some prerequisites to getting lock avoidance. then the a latch is taken. The process of lock avoidance compares the page log RBA (last time a change was made on the page) to the CLSN(Commit Log Sequence Number) on the log (last time all changes to the page set were committed) and then also checks the PUNC (Possibly Uncommitted) bit on the page. As of V8/V9 CURRENTDATA(YES) will allow the process of detection to begin • You need to be bound ISOLATION(CS) • Have a read-only cursor • Commit often – This is the most important point because all processes have to commit there changes in order for the CLSN to get set. and if it finally determines there are no outstanding changes on the page. If the CLSN never gets set then this process can never work and locks will always have to be taken. • Up to V8 you need to use CURRENTDATA(NO) on the bind. However. Hint: you can use the URCHKTH DSNZPARM to find applications that are not committing CA PERFORMANCE HANDBOOK FOR DB2 FOR z/OS 145 . but are you getting it? Lock avoidance can be a key component to high performance because to take an actual lock is about 400 CPU instructions and 540 bytes of memory. instead of a lock.TUNING TIPS Tips for Avoiding Locks Lock avoidance has been around since Version 3. to avoid a lock a latch is taken by the buffer manager and the cost is significantly less.

CA Performance Handbook Supplement for DB2 for z/OS .

trade names. All trademarks.Y.Table of Contents SECTION 1 1 About This Supplement SECTION 2 3 Ensure Efficient Application SQL for DB2 for z/OS with CA Database Management CA SQL-Ease® for DB2 for z/OS CA Plan Analyzer® for DB2 for z/OS SECTION 3 7 Ensure Efficient Object Access for DB2 for z/OS with CA Database Management CA Database Analyzer™ for DB2 for z/OS CA Rapid Reorg® for DB2 for z/OS SECTION 4 11 Monitor Applications for DB2 for z/OS with CA Database Management CA Insight™ Database Performance Monitor for DB2 for z/OS CA Detector® for DB2 for z/OS CA Subsystem Analyzer for DB2 for z/OS CA Thread Terminator Tool SECTION 5 17 Further Improve Application Performance for DB2 for z/OS with CA Database Management CA Index Expert™ for DB2 for z/OS Copyright © 2007 CA. and logos referenced herein belong to their respective companies. N. 11749. . One CA Plaza. service marks. Islandia. All Rights Reserved.

and implementation of a DB2 for z/OS application and specific techniques for performance tuning and monitoring.SECTION 1 About This Supplement This supplement to the CA Performance Management Handbook for DB2 for z/OS provides specific information on how CA Database Management for DB2 for z/OS addresses the performance management challenges outlined in the handbook with an approach that reduces CPU and DB2 resource overhead and streamlines labor intensive tasks with automated processes. and improve application performance. and configure subsystem resources for performance • Chapter 9: Describes tuning approaches when working with packaged applications • Chapter 10: Offers tuning tips drawn from real world experience The primary audiences for the handbook are physical and logical database administrators. The handbook assumes a good working knowledge of DB2 and SQL. About The CA Performance Management Handbook for DB2 for z/OS For more information. ensure efficient object access. and the DB2 subsystem. The supplement provides information to ensure efficient application SQL. CA PERFORMANCE HANDBOOK SUPPLEMENT FOR DB2 FOR z/OS 1 . and is designed to help you build good performance into the application.com/db. monitor applications. Individuals can register to receive the handbook at ca. descriptions of common performance and tuning issues during design. visit ca. database. It provides techniques to help you monitor DB2 for performance. • Chapter 1: Provides a overview information on database performance tuning • Chapter 2: Provides descriptions of SQL access paths • Chapter 3: Describes SQL tuning techniques • Chapter 4: Explains how to design and tune tables and indexes for performance • Chapter 5: Describes data access methods • Chapter 6: Describes how to properly monitor and isolate performance problem • Chapter 7: Describes DB2 application performance features • Chapter 8: Explains how to adjust subsystem parameters. development.com/db The handbook is a technical resource from CA that provides information to consider as you approach database performance management. and to identify and tune production performance problems.

Page intentionally left blank .

analyze. CA provides two products to assist you with SQL development and tuning. test. Developer productivity is improved as much of the manual coding effort is automated. Inefficient SQL can lead to poor application response times and high resource use and CPU costs within the DB2 system in which it runs. Tuning SQL in ERP applications can yield substantial performance gains and cost savings. It is still important to ensure the SQL within these applications is efficient. visit ca. link. namely CA SQL-Ease® for DB2 for z/OS and CA Plan Analyzer® for DB2 for z/OS. and tune SQL online. Sometimes. For more information. SQL should be written to filter data effectively while returning the minimum number of rows and columns to the application. within their ISPF editing session. CA SQL-Ease® for DB2 for z/OS CA SQL-Ease® for DB2 for z/OS (CA SQL-Ease) is designed to reduce the time and expertise required to develop highly efficient SQL statements for DB2 applications. you should ensure your database design supports indexes to facilitate the sort ordering. If sorting is necessary. the original design of the SQL is not entirely under your control. rather than to work to identify and fix inefficient SQL running in a live application. It saves time by allowing developers to generate. Developing efficient SQL initially and subsequent tuning of existing SQL in your application is very hard to do manually. without needing to compile.SECTION 2 Ensure Efficient Application SQL for DB2 for z/OS with CA Database Management When developing DB2 applications it is critical that SQL statements within the application are as efficient as possible to ensure good database and application performance. You may be implementing Enterprise Resource Planning (ERP) applications using dynamic SQL and packaged solutions on your systems. CA PERFORMANCE HANDBOOK SUPPLEMENT FOR DB2 FOR z/OS 3 . bind and execute the program. It is much more cost effective to develop efficient SQL from the outset –during development. Data filtering should be as efficient as possible using correctly coded and ordered predicates. Some SQL functions such as sorting should be avoided if possible.com/db Coding efficient SQL is not always easy since there are often a number of ways to obtain the same result set. CA SQL-Ease improves SQL performance by analyzing SQL and using an expert rule system offers SQL performance improvement recommendations as part of the development process so that SQL is highly tuned before it is ever moved to production.

CA SQL-Ease supports C. whether by tablespace scan or index access. COBOL.com/db 4 ca. predicate coding guidelines offering efficiency improvements for predicate filters and physical object guidelines offering improvements to the object which will improve the SQL access to it. CA SQL-Ease allows the programmer to manipulate the DB2 statistics for the object to allow ‘what-if’ scenarios to be played to see what effect different volumes of user data will have on the efficiency of the SQL they are developing.When developing a new application. Program host variable memory structures and DECLARE TABLE constructs can also be generated automatically. service units and TIMERONS. predicate analysis. assembler and PL/1 programming languages. CA SQL-Ease also provides an Enhanced Explain function offering enhanced access path information and uses an expert system to provide recommendations to show how the efficiency of the SQL could be improved. The GEN function will also include suggested filters in the WHERE clause based on indexes defined on the tables and example join predicates for joined tables based on indexes and referential relationships. RI dependency. Enhanced Explain also offers eight different reports to fully document the SQL using summary report.com . It shows whether DB2 can use an index to evaluate the predicate and whether it will be evaluated as stage 1 or stage 2. tree diagram and object statistics. The programmer can choose which columns will be automatically included in the SQL statement thus saving them having to type the column names. Estimated costs are shown in ms. INSERT. CA SQL-Ease also allows the programmer to execute the SQL statement to ensure the expected result set is returned. object dependency. The EXPLAIN function can be used to determine the access path that will be used by DB2 for the SQL statement. For more information. visit ca. The PRED function allows the programmer to check how efficient the predicates coded for the SQL statement are. It shows how the data will be accessed. cost estimate. The expert system provides recommendations in 3 categories: SQL coding guidelines to improve the efficiency of the SQL statement. access path. Once the programmer has finished formatting the SQL statement the SYNTAX function can be used to check that everything is syntactically correct. The programmer can also use the NOTES function as an online SQL reference manual. whether prefetch is used and shows the level of DB2 locking. the programmer can use the GEN function to automatically generate SELECT. UPDATE or DELETE statements for a given table. understanding and documentation purposes. The STAND function can be used to convert the SQL statement into a standard format for easy viewing.

a group of SQL statements right up to a complete DB2 application. By using the Compare feature. CA Plan Analyzer can analyze SQL from DB2 plans. CA Plan Analyzer adds further expert system support by providing recommendations to improve SQL efficiency based on making changes to plans and packages. the results are stored away within an historical database. object dependency. This feature is especially useful when you upgrade to a new DB2 version.com/db CA PERFORMANCE HANDBOOK SUPPLEMENT FOR DB2 FOR z/OS 5 . visit ca.CA Plan Analyzer® for DB2 for z/OS CA Plan Analyzer for DB2 for z/OS (CA Plan Analyzer) is designed to improve DB2 performance by efficiently analyzing SQL and provides in-depth SQL reports and recommendations to show you how to fix resource-hungry and inefficient SQL statements in your application. Binding your plans and packages on a new DB2 version may lead to different access paths. Each time you use the Explain features of CA Plan Analyzer. tree diagram and object statistics. The powerful Compare feature allows you to identify changed SQL statements. CA Plan Analyzer uses the same Enhanced Explain capability and expert system as CA SQLEase to offer recommendations showing how you can improve the efficiency of your SQL by making changes to SQL coding. cost estimate. One of CA Plan Analyzer’s most powerful features provides you with the ability to monitor SQL statements and their efficiency over time. RI dependency. The same eight Enhanced Explain reports are available namely: summary report. This means that as you enhance your applications. changed host variables and changed access paths with powerful filtering based on changes in statement cost using ms. This means that inefficient SQL identified in real time by CA Detector can be analyzed by CA Plan Analyzer to help you fix your performance problem. CA Plan Analyzer can compare your SQL access paths to those previously and highlight any changes which may affect the performance of your application. The strategy stores the logical grouping and means you can create a logical grouping for an application or a group of applications if you wish. CA Plan Analyzer is also integrated with CA Detector for DB2 for z/OS so SQL can also be analyzed online directly from within CA Detector’s application performance management functions. predicate coding or by making changes to physical objects. CA Plan Analyzer can evaluate your access paths on the new DB2 version without needing to bind your plans or packages. a file. predicate analysis. CA Detector allows you to capture problematic dynamic SQL statements and pass them to CA Plan Analyzer to identify the cause of the problem. SQL statements and QMF queries from the DB2 Catalog. service units and TIMERONS. SQL can be analyzed from any source and at any level from a single SQL statement. It can also analyze statements from a DBRMLIB. an exported QMF query or statements entered online. you only need to evaluate SQL statements whose access paths have changed rather than every SQL statement in your entire application. DBRMs. access path. For more information. This is especially useful for ERP environments where dynamic SQL is used. or make physical database object changes. packages. You can create logical groups of SQL sources using SQL from any of the sources listed by creating a strategy.

delete. This is especially useful when you have a new or enhanced application and you want to evaluate how efficient the SQL will be with the large data volumes of your production DB2 environment. browse. you can BIND. and start stored procedures and maintain your DB2 SYSPROCEDURES table. or REBIND some or all of the plans or packages listed by entering primary or line commands. so for example you can find all statements that use exclusive (X) DB2 locking or all statements that perform tablespace scans. CA Plan Analyzer also gives you access to Statistics Manager for analyzing SQL performance by projecting data statistics so you can create ‘what if’ scenarios based on different data volumes within your DB2 objects. CA Plan Analyzer further provides powerful features allowing you to identify problem SQL within your applications.com . FREE. such as all plans that use uncommitted read as an isolation level. statement and object reports allowing the DBA to administer the total SQL environment. All utilities can be executed both online and in batch processing. DBRM.com/db 6 ca. Enhanced Explain CA Plan Analyzer provides a whole series of plan. For more information.CA Plan Analyzer also allows you to evaluate the efficiency of your application on another DB2 system without needing to move it there. In addition. package. You can search for SQL whose access characteristics may be undesirable. Working directly from any report. Along similar lines you can search for problem plans or packages which may use undesirable bind characteristics. CA Plan Analyzer also provides a Stored Procedures Maintenance facility giving you the ability to create. visit ca. update.

SECTION 3

Ensure Efficient Object Access for DB2 for z/OS with CA Database Management
Tuning your application SQL statements is critical but all that effort could be wasted if the DB2 Catalog statistics for your application objects did not accurately reflect the actual data volumes stored in your application objects, or if your physical DB2 objects became excessively disorganized. Efficient SQL access paths rely on accurate DB2 Catalog statistics. When you bind your application plans and packages, or if you are using dynamic SQL, the DB2 optimizer will use DB2 Catalog statistics to help determine which access path to use for a given SQL statement. For example, if the number of rows in a table is very small, DB2 may choose to use a tablespace scan rather than index access. This provides reasonable results as long as the data volumes in that table remain small. If the actual data volumes in the table become very large, using a tablespace scan would likely cause application performance to suffer.

For more information, visit ca.com/db

Keeping your DB2 Catalog statistics current requires you to collect RUNSTATS on a regular basis. This is even more important today if you have ERP applications using dynamic SQL because data volumes can vary widely. Some tables can contain 0 rows one minute and then an hour later hold over a million. When would you run RUNSTATS on such an object? Even with dynamic SQL, the DB2 optimizer still has to rely on the DB2 Catalog statistics being accurate to choose the best access path. The key then is to collect RUNSTATS regularly. Given the immense volumes of data in large enterprises, running RUNSTATS can be a very expensive and time-consuming exercise. Application performance can also be badly impacted if your physical DB2 objects are excessively disorganized. The more work that DB2 has to do to physically locate the desired data rows, the longer your application will wait. ERP applications are notorious for spawning highly disorganized data. First, ERP applications tend to perform high volumes of INSERT statements which cause indexes to become badly disorganized and wasteful of space. Next, since most ERP objects contain only variable length rows, any UPDATE to a row that causes the row length to change can mean the row gets relocated away from its home page. Finally, because of the high volumes of transactions common is such ERP systems, physical objects can quickly become badly disorganized. This can impact the performance of your application significantly.

CA PERFORMANCE HANDBOOK SUPPLEMENT FOR DB2 FOR z/OS 7

CA Database Analyzer™ for DB2 for z/OS
CA Database Analyzer™ for DB2 for z/OS (CA Database Analyzer) is designed to help you keep DB2 databases and SQL applications performing optimally by ensuring DB2 objects never become disorganized. Additionally, CA Database Analyzer can be used to replace RUNSTATS and collect DB2 Catalog statistics faster. This allows you to collect statistics more frequently with less impact to your system. Using CA Database Analyzer, there is never an excuse for having out-of-date DB2 Catalog statistics. CA Database Analyzer allows you to define Extract Procedures. These define a logical grouping of DB2 objects and supports extensive wildcarding of object names. It is simple to specify that you want to include all objects based on a creator ID of perhaps SAPR3, thus allowing all objects to be included for SAP using the schema name of SAPR3. So, a simple Extract Procedure definition could include many thousands of DB2 objects for a complete ERP application. Since object names are evaluated at run time, any newly created objects will automatically be included. Extract Procedures can be used to collect statistics, including RUNSTATS statistics, for all objects which meet the selection criteria. The process uses multi-tasking so multiple objects are processed in parallel. The process runs very quickly and does not impact DB2 or your DB2 applications.

For more information, visit ca.com/db

In addition to collecting RUNSTATS statistics, an Extract Procedure will collect approximately twice the number of data points for each object that RUNSTATS does. The other major feature of statistics collection using an Extract Procedure is that all data points for every execution are stored away within a database. This allows for online reporting and also provides the opportunity for identifying trends in data volumes and allows you to forecast when an event will occur based on the current growth. The reporting function allows you to view the data points which have been collected for each DB2 object. Twenty-one different reports are available for tablespace objects while sixteen reports are provided of index objects. Each report can be viewed as a data query, a graph, a trend or a forecast. The latest data can be viewed or weekly, monthly, or in yearly aggregated values. Another powerful feature in CA Database Analyzer gives you the ability to selectively generate DB2 utility jobs based on object data point criteria. So, for example, you can specify that you want to run a REORG if the cluster ratio of any tablespace is lower than say 60%. This is achieved by creating an Action Procedure. An Action Procedure defines two things. You may specify under which conditions you would like to trigger a DB2 utility. Over 100 example triggers are provided in the tool and these can be evaluated against DB2 Catalog statistics, data points with the database and DB2 Real Time Statistics. You may also set a threshold value for many of the triggers. Once you have specified your triggers, you specify which DB2 utilities you would like to trigger. This supports all of the IBM utility tools, all of the CA utility tools, and various other utility programs and user applications. By tying an Action Procedure to an Extract Procedure you link the objects you are interested in to the triggers and trigger thresholds.

8

ca.com

Using Extract and Action Procedures in this way, it is easy to not only ensure you have the latest statistics for your application objects, but you can also generate utility jobs, such as CA Rapid Reorg to REORG objects which are badly disorganized — automatically. This means you can REORG the objects which need to be reorganized and not waste time reorganizing other objects which are not disorganized. This approach helps you keep your applications performing at the optimum level whilst not wasting any of your valuable CPU resources.

CA Rapid Reorg® for DB2 for z/OS
CA Rapid Reorg® for DB2 for z/OS (CA Rapid Reorg) is a high-speed REORG utility that is designed to reorganize DB2 objects in the smallest amount of time. While providing the highest performance to cope with the huge data volumes common in today’s databases, CA Rapid Reorg also maintains the highest levels of data availability while the REORG is running. This means your applications can continue to run in parallel to the REORG process. CA Rapid Reorg is also highly efficient during execution; this means CPU use is low and thus minimizes the impact on system resources. High levels of performance are provided by using multi-tasking and parallel processing of partitions. Indexes can be built in parallel to the tablespace partitions and non-partitioning indexes can either be completely rebuilt or updated. Sorting of data is performed efficiently and data is sorted in its compressed form. Clever use of z/OS dataspaces allows buffer areas to be placed in fast storage rather than using slow work files. Lastly, a Log Monitor address space can be used to filter log records for use during the high speed log apply phase of an online REORG. With 24x7 application availability becoming the norm, online reorganization is a necessity. An online REORG uses a shadow copy of the DB2 object where the reorganized data is written and log records are applied. Once the REORG process has finished the shadow copy of the DB2 object is switched to become the ‘live’ copy of the object. This switch phase can cause problems for applications, particularly ERP applications where long running transactions can hold locks for many hours without committing. CA Rapid Reorg provides the greatest flexibility to control the switch phase for an online REORG. Whatever REORG utility you use, there will be a short time when availability of the DB2 object is suspended while the switch phase occurs. With CA Rapid Reorg, you get to choose and control when access to the object is suspended. You can choose that the REORG will wait until all application CLAIMS have completed before the REORG will suspend access. Once access is suspended the switch will happen quickly using FASTSWITCH. Using this approach you can avoid the situation with other REORG tools where a DRAIN is issued at the switch phase and has to wait for long running transactions to complete while at the same time preventing other SQL CLAIMS from the application from running. CA Rapid Reorg provides the best control over application availability for online REORG.

For more information, visit ca.com/db

CA PERFORMANCE HANDBOOK SUPPLEMENT FOR DB2 FOR z/OS 9

com/db 10 ca.CA Rapid Reorg is integrated with the utility generation features of CA Database Analyzer. CA Database Analyzer can be used to determine the most critical application objects which need to be reorganized based on user supplied selection criteria. For more information. Using CA Rapid Reorg you can maintain the performance of your applications by ensuring the data is fully organized.com . thus improving your application performance further. It can then generate the JCL required to invoke CA Rapid Reorg. visit ca. CA Rapid Reorg is so efficient you may find you can afford to run REORG much more frequently.

these volumes can change over time and impact performance at some point in the future. For more information. Additionally. This allows you to quickly identify performance problems in real-time and to take corrective actions. DB2 connections from CICS. Your application workload and data volumes may not be what you expected and these can cause performance bottlenecks. Lastly. NATURAL and network connected applications from outside the z/OS world. performance traces and statistics reports. This is where automation is necessary and you need tools to help you with this. it is always disappointing when your application performance does not meet your expectations. These resources also do not give you all the information you need or collate the data into a useful format so that you can identify where your performance problems lie. zIIP processor fields and DB2 performance traces and provides you online access to critical performance statistics. visit ca.SECTION 4 Monitor Applications for DB2 for z/OS with CA Database Management After you have spent time and effort tuning your application SQL to be as efficient as possible and tuning your DB2 objects for high performance. DB2 and z/OS control blocks. You may have bottlenecks within the DB2 system itself and you need to find them and take corrective action. DB2 provides accounting traces.com/db CA Insight™ Database Performance Monitor for DB2 for z/OS CA Insight™ Database Performance Monitor for DB2 for z/OS (CA Insight DPM) is a comprehensive real-time performance monitor for your DB2 systems and applications. The problem however is that this is not easy. are not automatic and their use impacts the performance of the very systems you want to monitor. Which ones are causing the performance problems? Should you be worried about the tablespace scan in your application which runs once a day or maybe the highly efficient matching index scan which runs millions of times a day? How do you evaluate which one uses the most CPU and DB2 resources? The only way to answer these questions is to monitor your DB2 systems and applications focusing on your most serious performance problems and address those as they happen. but these are difficult to interpret. it is hard to easily find where the performance bottlenecks lie. It collects data from z/OS subsystem interfaces. IMS. as well as application statistics to assess and troubleshoot problems as they arise. these are the most dynamic environments to manage and are the hardest to maintain with good stable performance. Monitoring your application performance and making changes to improve performance is a daily task and can be a time consuming exercise. CA Insight DPM can monitor DB2 subsystems. Additionally. You may have thousands of SQL statements within your application. CA PERFORMANCE HANDBOOK SUPPLEMENT FOR DB2 FOR z/OS 11 . if you are using an ERP application using dynamic SQL. There could be many reasons for this.

lock activity. visit ca. The System Condition Monitor provides a single point for viewing overview status information for all DB2 systems you are monitoring. you can invoke EXPLAIN to identify the access path and see why you have that problem. DB2 system activity is collected based on user-defined time intervals. log activity and SQL activity counts. You may also write new on demand requests to address new requirements. EDM pool usage. the exception will be highlighted. timing information.CA Insight DPM is designed to operate with the lowest level of overhead. or the difference or delta between the current and most recent interval. storage usage. CA Insight DPM provides a System Condition Monitor function. including determining how long the thread has been active. all of which can be customized using the IQL language. When they occur. When viewing DB2 application thread activity. Thread information includes SQL text. An exception can also be configured to submit a user-defined Intelligent Module (IMOD) to automatically initiate corrective action. CA Insight DPM is supplied with many built-in requests. By setting thresholds in the Exception Monitor. The low overhead means you can afford to run CA Insight DPM 24 x 7. A data collector task is used to collect the DB2 information for monitoring use. When a DB2 processing limit is reached or exceeded. it dynamically enables traces as they are required. SQL counts. For more information. One further powerful feature of CA Insight DPM is the implementation of Insight Query Language (IQL). Having identified a poorly performing SQL statement. From there you can make the decision as to how you will fix the problem.com/db 12 ca. All screen displays within CA Insight DPM are built using IQL. The data collector can be customized for each DB2 subsystem. meaning you can monitor your DB2 systems and applications all of the time and never miss a performance problem. locks. you can drill down into a thread for deeper analysis of potential problems. DB2 system information is displayed as an accumulation of all intervals. Rather than requiring all account traces and all performance traces to be turned on. One of the most powerful features of CA Insight DPM is the Exception Monitor.com . how much of that time is spent in DB2 and how much time is spent waiting for DB2 resources. Distributed Data Facility (DDF) data and Resource Limit Facility (RLF) data. you do not need to constantly watch the screens looking for potential performance problems. Information is provided on buffer pool usage. application events and SQL events. buffer pool activity. This is the starting point for finding performance problems and then drilling down to identify the cause. CA Insight DPM includes hundreds of predefined exception conditions that can be selectively activated for system events. This makes CA Insight DPM very flexible and extensible for your environment and particular monitoring needs. making it easy to vary the set of collected performance information from one subsystem to the next and get the precise amount of detail desired. This allows you to instantly see when a potential performance issue exists and react to fix the problem. CA Insight DPM will notify you.

but at the application level. This can be done using wildcards to include or exclude SQL sources from your defined application. Additionally. From there you can view the SQL statement text and even jump to CA Plan Analyzer to take advantage of all of the enhanced explain capabilities of that tool to help solve the problem. CA Detector aggregates the information for all of the SQL sources within your application into a single view of performance data for your application. package or SQL statement. rather than tune your system to make-up for deficiencies in your application. CA Detector collects its data exclusively from accounting trace information. This can highlight for example if that single daily execution of a tablespace scan in your application is actually any worse in performance terms than the index scan which run millions of times a day. From there you can drill down and see which is the most resource intensive package. if you know you have a poorly performing application. Its low overhead means that you can run Detector 24x7 and never miss an application performance problem. This means you can tune your application to perform efficiently on your system. packages.CA Detector® for DB2 for z/OS CA Detector® for DB2 for z/OS (CA Detector) is an application performance analysis tool for DB2 which provides unique insight into your application workload and resource use. not at the DB2 system level. Not only that. The unique capability of CA Detector is that on any display it always shows the most resource intensive application. In addition to viewing data at the application level. then SQL statement. plan. plan. you can unload this if you wish and process it to further refine your performance data. CA Detector allows you to identify the programs and SQL statements that most significantly affect your DB2 system performance. from there the packages and from there down to the individual SQL statements. package. both dynamic and static. With CA Detector. CA Detector analyzes all SQL than runs on your system. Accounting trace data imposes a significantly lower monitoring overhead on your DB2 system than performance traces will. All information collected by CA Detector is stored in a datastore which rolls into a history datastore driven by the user defined interval period. This capability allows you to find the operations within your application which are consuming the most DB2 resources based on how your users are using the application. For example. DBRM and SQL statement level. It uniquely allows you to view activity at the application. For more information. This means you can store information concerning performance of your system over time. you specify which plans. you can drill down to see which is the most resource intensive plan. CA Detector allows you to view performance data at the DB2 system level and at the DB2 member level when DB2 datasharing is used. You can drill down from the application level to view the plans.com/db CA PERFORMANCE HANDBOOK SUPPLEMENT FOR DB2 FOR z/OS 13 . DBRMs and SQL statements belong to which application. the use of accounting data allows CA Detector to provide its unique focus on performance. visit ca.

com . having drilled down to view the busiest DB2 table in your system using CA Subsystem Analyzer. 14 ca.com/db CA Subsystem Analyzer for DB2 for z/OS CA Subsystem Analyzer for DB2 for z/OS (CA Subsystem Analyzer) is a complementary product to CA Detector and uses the same collection interval process and datastore. CA Detector has the ability to look for SQL error conditions occurring within your application. you will never know they are happening. This feature provides an automated method for monitoring for bad SQL which consumes vast amounts of DB2 resources. indexes. For example. Tight process flow integration between CA Subsystem Analyzer and CA Detector means that you can seamlessly and automatically transition between the two products. tablespaces. buffer pools and DASD volumes. You are then taken seamlessly into CA Detector to view those SQL statements where you can jump to CA Plan Analyzer if you like to take advantage of the enhanced explain capabilities of that tool. EDM pools. you can select that you want to view the SQL accessing that table. tables. When trying to improve application performance. Any SQL statement which exceeds the threshold value will be highlighted in the exception display for further investigation. CA Detector collects performance information for dynamic SQL in exactly the same way as it does for static SQL. then drill down logically to look at specific details. From here you can proactively go and fix the problem. CA Detector will trap the SQL errors and show them to you in the SQL Error display. CA Subsystem Analyzer is designed to show you how those applications are impacting your DB2 resources. For more information. With CA Subsystem Analyzer. CA Detector collects the SQL text for dynamic SQL and can even help you identify when similar dynamic statements are failing to take advantage of the statement cache because of the use of constants or literals in the SQL text. whether it is a list of DASD volumes or a list of buffer pools. These SQL statements are the statements to focus on to improve performance on a daily basis. There is no need to set an exception to do this. With the increased use of dynamic SQL within applications. system-wide view to identify and examine the most active DB2 databases. You may have SQL transactions which are constantly receiving a negative SQL code when trying to update a table. Making simple changes to use parameter markers and host variable can significantly improve the performance of your applications by taking full advantage of the dynamic SQL cache. datasets and even dataset extents. While CA Detector looks at how SQL is performing within your applications. CA Subsystem Analyzer works in the same way as CA Detector and always shows you the most active item in the list.Another powerful feature of CA Detector is the ability to view exceptions. DASD volumes. such as buffer pools. you can take a broad. RID pools. it is not always the application SQL which is causing the problem. visit ca. it is even more important that you can manage the performance of dynamic SQL statements in the same way as static SQL. particularly in ERP applications. Sometimes you need to make changes to your DB2 system to improve performance or remove a processing bottleneck. Exceptions can be defined by users using threshold values. Unless the application is written to notify you of those failures. In a similar way.

performance values. during the day you may need buffer pools tuned for random I/O to support online transactions. With CA Subsystem Analyzer it is easy. you can drill down and view the thread detail including the SQL text. You may decide that one of the tablespaces should be moved to a different buffer pool to reduce contention and improve performance. You can also choose to cancel that thread if you like to release the DB2 resources that it holds. it is possible to make significant changes to your DB2 system parameters to reflect the type of processing occurring at any point in the day. CA Thread Terminator Tool CA Thread Terminator is one of the tools supplied within the CA Value Pack. The cancel request will work even if the status of the thread is not in DB2. The problem with these is that they consume system resources and potentially hold locks preventing other transactions and utilities. From a thread. You can evaluate the DB2 statistics and see the amount of resource the thread has consumed. It is also possible to automate changes to these parameters by creating a schedule. you may notice that a certain buffer pool has very high activity. security parameters. CA Thread Terminator is a very powerful tool that helps you manage your DB2 system and application in real-time. Using schedules. CA Subsystem Analyzer allows you to view your system performance data at the DB2 system level and at the DB2 member level when DB2 data-sharing is used. For dynamic SQL you can see the SQL text and also the contents of the host variables. whilst at night you need them to be tuned for batch transactions which tend to process data sequentially. you can firstly view the SQL and host variable for the run-away thread. storage sizes including EDM pool. You can also add or delete active log datasets. A common problem. The schedule will then implement that change at that time. For example. thread parameters and operator parameters etc. With CA Thread Terminator. A large amount of information concerning in DB2 times. particularly with ERP applications with dynamic SQL. such as REORG. is run-away threads. These are SQL requests which may run for many hours and never finish. maybe at 10 pm each day. The tool allows you to make changes to your DB2 environment immediately. CA Thread Terminator can also show you a list of all active DB2 threads for all DB2 systems. visit ca. You can drill down to see which active objects are using that buffer pool. You will see a list of tablespaces showing the tablespace with the highest activity at the top.It is sometimes hard to determine which objects are being used most frequently and should be isolated from each other to avoid contention. For more information. whereas to make that change using normal DB2 techniques may normally require you to re-cycle DB2. A schedule lets you specify the change you want to make and allows you to specify when you want that change to take effect. You can make a whole list of changes to your system parameters including changes to buffer pool sizes. from continuing. logging parameters. For example. wait times and I/O counts can be seen for each thread. You can even drill down further to view the most active table and from there view the SQL activity occurring on that table if you like.com/db CA PERFORMANCE HANDBOOK SUPPLEMENT FOR DB2 FOR z/OS 15 . application values.

For more information. This provides you with an automated approach to identify and cancel run-away threads in your application. Another useful feature of CA Thread Terminator. In these profiles. you can set maximum threshold values for DB2 resources over which any thread will automatically be cancelled. This is a useful feature if you are in a situation where you have to REORG and object. visit ca.com .com/db 16 ca.CA Thread Terminator also allows you to set-up thread monitor profiles. by locks associated with threads are preventing the REORG from running. allows you to terminate all threads accessing a particular table or pageset.

you can analyze an entire application quickly and efficiently. Worst still. You may also have indexes which the DB2 optimizer never chooses to use. Additionally. Whatever indexes you have in your application. The problem however. it may be that they do not provide the performance benefits you expected. you may be able to make database and system improvements highlighted by CA Subsystem Analyzer. You may be able to make further SQL performance improvements highlighted by the enhanced explain capability of CA Plan Analyzer. One area where performance can be improved further is effective index design. This tool generates indexing recommendations based on an expert system. and recommends the most appropriate indexes to optimize DB2 application performance.SECTION 5 Further Improve Application Performance for DB2 for z/OS with CA Database Management After designing and implementing an efficient SQL application in your production system. Monitoring your application using tools like CA Detector and CA Insight DPM can highlight where the performance problem lies. No doubt you will have designed your indexes to support fast access to the table data. you may have implemented an ERP application only to find that performance is poor. if you are using an ERP system such as SAP. With CA Index Expert. you could never be certain what the distribution of your data would look like. Using data imported from CA Detector. This is an important benefit. CA Index Expert analyzes the SQL statements used in your applications. For more information. and make intelligent index design decisions. along with execution frequency statistics. CA Index Expert knows how existing indexes and tables are referenced and how often. Alternatively. you may have had no control over the design of indexes which are implemented. is that when you designed your indexes. CA PERFORMANCE HANDBOOK SUPPLEMENT FOR DB2 FOR z/OS 17 . compares the analysis against a set of indexing rules. Both of these situations can impact application performance. This product saves time and reduces errors by analyzing complex SQL column dependencies and reviewing SQL to determine which DB2 objects are referenced.com/db CA Index Expert™ for DB2 for z/OS CA Index Expert™ for DB2 for z/OS (CA Index Expert) makes the extremely complex and crucial job of index tuning much easier. CA Index Expert recommends indexing strategies at the application level with suggestions based on actual SQL usage. sometimes performance does not meet your expectations. It is only after you are in production and when production data volumes are loaded can you see how effective your indexes are. visit ca.

where performance improvements are needed. if this is an ERP application you may not be able to change the SQL to resolve the problem. Analysis of the SQL using the enhanced explain capability of CA Plan Analyzer may find that index access is not being used for some reason. but the actual SQL is unavailable. Implementation of this index design will solve your application performance problem. This is of particular value when working with ERP applications such as SAP using dynamic SQL. Using CA Detector.com/db 18 ca. customized indexes can be added and redundant indexes can even be dropped to improve the performance based on your usage of the application. Using CA Index Expert you can obtain recommendations for changes to the index design which would allow index access to be used. For more information. visit ca.com . Remember.You can explore various ‘what-if’ scenarios to improve the use of indexes in your application. Building efficient indexes can significantly improve the performance of your DB2 application. it is easy to identify a problem SQL transaction. In these situations.

manage and secure IT. unifies and simplifies complex IT management across the enterprise for greater business results. With our Enterprise IT Management vision. one of the world’s largest information technology (IT) management software companies. we help customers effectively govern.CA. HB05ESMDBMS01E MP321131007 . solutions and expertise.

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.