WRITING GOOD SQL STRATEGIES, TIPS & TRICKS

OVERVIEW

 Introductory remarks - SQL, T-SQL, versions
 Places to go for more information
 What you need to know to write good SQL
 Working Method
 Overview of query batch processing
 Overview of Optimizer
 Using ‘showplan’ and ‘traceon/traceoff’
 General problem scenarios
 Specific SQL Tips
 Demos, examples
 CONTESTS WITH PRIZES!

SQL WHAT?

 SQL = Structured Query Language
 Developed in IBM’s San Jose research lab 1970s
 Now ANSI and ISO standard for RDBMS SQL
o ANSI = American National Standards Institute
o ISO = International Organization for Standardization
 SQL86=>SQL89=>SQL92
 Entry, Intermediate and Full compliance
 Embedded, Native, Dynamic
 DDL (Data Definition Language), DML (Data Modification), SAL (System Administration
Language) and QL (Query Language - what we’ll mostly cover)
 Use Sybase ‘SET’ options for ‘Entry’ SQL92 compliance
 Focus today on ‘Transact-SQL’ not ‘PL/SQL’

INFORMATION SOURCES ON SQL - SYBOOKS (in-house)

 Sybase ASE Release 11.5 Collection::Transact-SQL User’s Guide
 Sybase ASE Release 11.5 Collection:: Sybase ASE Performance & Tuning Guide
Chapter 11

INFORMATION SOURCES ON SQL - BOOKS (in-house)

 A Guide to the SQL Standard by C. J. Date (4/97 avail)
 Sybase SQL Server Performance and Tuning Guide by Karen Paulsell, chapter 10
(similar content in SYBOOKS)
 Sybase SQL Server Performance Tuning by Shaibal Roy and Marc B. Sugiyama, pages
194-205
 Optimizing Transact-SQL: Advanced Programming Techniques by Rozenshtein,
Abramovich & Birger
 The Database Experts’ Guide to SQL by Frank Lusardi (now getting a little dated but
still useful)

com/education/oln [Oracle's online Web training site] INFORMATION SOURCES ON SQL .oracle. An australian site. slides from a presentation by Jeff Lichtman  Subquery Processing Performance Improvements.org [International Oracle Users Group. white papers.com [Ed Barlow’s Sybase stuff site]  http://uccbt. Hernandez  SQL for Smarties: Advanced SQL Programming by Joe Celko  SQL: The Complete Reference (Osborne's complete reference series) by Groff and Weinberg  The Guru's Guide to Transact-SQL by Kenneth W. keys)  Understand the relationships between the data objects you’ll use (esp. tech info]  http://www.com [International Sybase Users Group]  http://www. internals.com [Oracle Technology Network . slides from a presentation by Jeff Lichtman INFORMATION SOURCES ON SQL .uchicago.white papers.oracle. J. product info. cardinality)  Understand the meaning of the data you’ll use (meaning. join opportunities. Steve Adams' site .expert on Oracle internals]  http://www.NEW BOOKS  Oracle PL/SQL Tips & Techniques (Oracle Series) by Joseph C. IOUG master class training et cetera]  http://www.edu/cbtweb/english/cbtweb/index.ARTICLES (in-house)  The Essence of SQL: A Guide to Learning Most of SQK in the Least Amount of Time by David Rozenshtein in SQL Forum Vol 4 No 4.isug. uniqueness)  Understand how much data you are querying or generating  Understand the SQL syntax  Understand the report requirements or the SQL processing that is needed  Understand how the SQL optimizer works  Understand how your SQL articulates with with a “wrapping language” if present . several sections  http://www.oracle. Trezzo (the definitive 'Tips' book on PL/SQL)  Oracle8 PL/SQL Programming by Urman and Rinaldi  1001 SQL Tips by Konrad King (due out in April 2001)  SQL Queries for Mere Mortals: A Hands-On Guide to Data Manipulation in SQL by M.ixora. code snips.com [main Sybase site]  http://www. Henderson WHAT DO YOU NEED TO KNOW TO WRITE GOOD SQL? YOU NEED TO KNOW A LOT:  Understand the data objects you’ll use (esp.com [Main Oracle site]  http://technet.Oracle on Unix. whole issue (later a book) [Excellent short introduction to SQL]  Performance Tips for Transact-SQL. missing data.edu/cbtweb/english/cbtweb/index.htm => Oracle Introduction.Web (accessible)  http://www. NULLs. tables. indexes.INFORMATION SOURCES ON SQL .edbarlow.uchicago. views. Tech stuff.ioug. values.htm => SQL Programming: Database Queries (4 hr)  http://uccbt.sybase.au [Ixora .com.

examining results after each query. [Most of the time!]  Review query results carefully to make sure they are what you expect.. Pay special attention to issues related to: NULLs. optimize queries and form query plan.  The way NOT to write new SQL is to launch large complex integrated queries against poorly understood objects of unknown size.  EXECUTE: Run query plan and send results to client. process constraints. dups and any anomalies.GOOD WORKING METHODS  Proceed incrementally. flatten subqueries. until the desired result  Break down complex SQL into simple manageable statements  Use temp tables (“#” or “t_”) to stage data between query steps if necessary.  COMPILE: Resolve views.  Use progressively more complex prototypical SQL. use last otherwise other commands won't work!]  dbcc traceon/traceoff(3604) [sends trace output to current session]  dbcc traceon/traceoff(310) [ info on join selection] .HOW TO WRITE SQL . key structure et cetera. iteratively and empirically  Have some idea what to expect…how long it should take.  NORMALIZE: The ‘parse tree’ is reorganized for greater efficiency. Perform syntax checks. missing data. indexing.what work you are asking the server to accomplish. OVERVIEW OF T-SQL BATCH PROCESSING  PARSE: Read the SQL supplied by the client and create an internal data structure called a ‘parse tree’. OVERVIEW OF T-SQL COMPILATION PHASES OF OPTIMIZER:  QUERY ANALYSIS o Find SEARCH clauses o Find OR clauses o Find JOIN clauses  INDEX SELECTION o Select best index for each clause  JOIN SELECTION o Determine JOIN ORDER o Estimate costs and pick best plan  ‘Cost based’ versus ‘Rule based’ optimizers USING ‘SHOWPLAN’ AND ‘TRACEON’ TO FOLLOW THE SYBASE OPTIMIZER  set statistics time on/off [shows query timing in milliseconds]  set statistics io on/off [shows quantity of I/O in arbitrary units]  set showplan on/off [shows the query plan]  set noexec on/off [shuts off query execution. Existence checks and permissions checks are performed.. Index temp tables when needed.

1 NOT USING AN INDEX:  In most cases. o Avoid multiple OR operators in predicates.  If you think you’re query should be using an index and ‘showplan’ indicates that it is not. o Max is about 6-7 tables on IRF2. o Avoid joining more than 3 really big tables (> 10^6 rows)  Avoid excessively complex predicate o It is easy to write predicates that prevent proper index usage.  dbcc traceon/traceoff(302) [info on index selection]  The appendices of Client/Server Data Design with Sybase by G.]  Use ‘showplan’ to confirm expectations about index usage. COMMON SQL PROBLEMS .  Start queries off more simply with fewer tables and/or more simple or more restrictive predicates to develop a performance baseline. [More slides on this. Understand how/when indexes are used. Anderson has long lists of dbcc and trace commands. May not be supported…test in development! COMMON SQL PROBLEMS .  Know the size of the objects in the query and try to understand how much work is being requested. Generally. ask a DBA to review the problem.  If an apparently obvious index was not used…understand why. [More slides on this. [More slides on this. an index should be used for each table in the query. o Much depends on the indexes used and the efficiency of those indexes. but only if joined properly on narrow clustered index keys. when an index isn’t used the entire table is scanned. . o Avoid more than one subquery in a predicate.2 QUERY IS TOO COMPLEX:  Avoid joining too many tables. Avoid this. COMMON SQL PROBLEMS .] o Avoid combining two or more special predicate statements like ‘GROUP BY’ with an aggregate and/or ‘SORT’ and/or ‘COMPUTE BY’ and/or ‘HAVING’ et cetera.3 UNREALISTIC EXPECTATIONS:  Some properly formed queries legitimately ask the RDBMS server to do a lot of work and may take time to execute. This is bad :>(  Know what tables are indexed and how.]  Understand how certain predicate constructions prevent use of an index.

 Improper use of aggregate with ‘GROUP BY’.5 USE OF CURSORS:  When using cursors. >=.  SARGs have this form: o <column> <operator> <expression> o <expression> <operator> <column> o <column> is null  IMPORTANT: ‘column’ must only be a column name. <=. >.4 UNEXPECTED RESULTS . o Keys may be absent due to synchronization (timing) issues between different domains of data.  Use or misuse of ‘DISTINCT’ keyword.  Missing keys. WHAT ARE SARGs?  A SARG (Search ARGument) is a WHERE clause specification that can be used to determine a path to data by matching an index key and determining the use of an index. <>. There are always unpredictable performance consequences to the use of cursors. !>. It can NOT be used in a function. !<.  Never use cursors when ‘set SQL’ will suffice. That is. is null  If expression is a constant -> can use distribution vals in index statistics  If expression is evaluated at runtime -> only use density from distribution page  Non-equality operators special case -> can use covering non-clustered index for non- matching index scan .POSSIBLE CAUSES:  NULLs or missing data. !=.COMMON SQL PROBLEMS . o Keys may be absent due to missing data. COMMON SQL PROBLEMS . be attentive to transaction blocking issues. acquire a large stock of garlic.  Equijoin between tables A and B fails to return a ‘join row’ because while table A has a qualifying row and table B is EXPECTED to have a qualifying row…but it does not.  Keep it simple…never use cursors. o Keys may be absent due to ‘matching’ (incompatibility) issues between different domains of data. expression or concatenation to be used as a SARG!  Permitted operators: o =.  Sorting behavior varies according to datatype. relationship is 1 to 0/1/N.  Some cursors operations require specific types of indexes to support them (like unique or clustered).  Duplicate rows.  If cursors must be used. crucifixes and wooden stakes.  Use cursors ONLY when absolutely necessary. <.

expressions or concatenations used against a join column will prevent that column from being used as an index. JOIN ORDER  Join order can make a big difference in query performance. In these cases.  In rare cases when needed. SARG EXAMPLES  Valid SARG examples (I.  In cases where one of the columns must be converted in a join. This is rarely needed and should be used with care.00 [col used in expression] o A.up to 8 tables.00) AND (admission < 12. the Sybase optimizer may not determine the best join order. Info regarding join order for a query can be obtained through ‘showplan’ and the 310 trace.  There are caveats: o The datatypes of columns being joined MUST BE THE SAME.cur_sal [col = col not valid SARG] JOINS . Optimization time may increase significantly. join order can be ‘hard wired’ with ‘SET forceplan on’. uses an index if available): o production = “RED BLACK & BOO” (col = constant) o production like “RED%” [becomes: >= “RED” and < “REE”] o ((admission > 8. an index on that column can not be used.e. but not distribution]  The following are NOT SARGs (I. That is. but not distribution] o salary = @yearly_sal [uses density.00 [uses density. 3) = “RED” [col used in function!] o convert(integer. This ‘SET’ option is rarely needed and should be used with care.e. The statement “LIKE ‘%Dept’” is not. . 1.00)) o salary = 12 * $100000. The statement “LIKE ‘Univ%’” is a SARG. o Functions. the optimizer determines the ‘best’ join order by costing 4 table subsets and progressively selects the outermost join tables. will not use index): o substring(production. force the scan to the smaller table. try moving the datatype conversion (the ‘CONVERT’ statement) to the join column associated with the SMALLER of the two tables.last_salary = A.SARG EQUIVALENTS Some things that can be converted to SARGs:  BETWEEN statement converted to >= and <= clause  LIKE statement converted to > or < queries using alpha sort order. acct6) = 20354 [col used in function] o production like “%BOO” [must lead like string with a char] o salary * 12 = $1200000.  If there are more than 4 tables in a join.INDEX USAGE  There should always be indexes on keys and the optimizer will usually use them for joins between tables on their primary or foreign keys. If a column must be converted (even implicitly in some cases) to process the join.  A Sybase ‘SET’ option can increase the number of tables considered when costing joins .

and C. the join from B to C AND the join from A to C if valid.4  Avoid datatype mismatches between join columns or across the <operator> in SARGs.  EXAMPLE: o CREATE index Acct_Ind on Acct (acct. SPECIFIC T-SQL TIPS . .3  Provide as much valid information in WHERE clauses as possible for the optimizer to consider: o Provide all possible joins for the optimizer to review. l) o Select * from Acct where sub = 5400 [This query will NOT use the composite index ‘Acct_Ind’. o NOTE: A composite index is a single index constructed across more than one table column. o Supply as many valid SARGs as possible. specify the join from A to B. o The statement: president LIKE “%linton” will NOT use an index. make sure the wildcard string starts with at least one character before the wildcard character.1  The optimizer won’t use a ‘composite’ index UNLESS you have a valid SARG against (at least) the first component of the ‘composite’ index. In particular. expressions or concatenation on a column (in a predicate) will prevent use of an index on that column. sub. SPECIFIC T-SQL TIPS . when joining three tables A.2  The use of functions. it may be useful to provide redundant SARGs for the same column present in each of two joined tables. An index will often not be used in these cases.  EXAMPLE: o select * from SLAcct10 where SL10_SLAcct10_Num = '4272036300’ [Uses index on SL10_SLAcct10_Num] o select * from SLAcct10 where ltrim(SL10_SLAcct10_Num) = '4272036300' [Will NOT use index on SL10_SLAcct10_Num]  Showplan demo on queries SPECIFIC T-SQL TIPS .SPECIFIC T-SQL TIPS . B.] SPECIFIC T-SQL TIPS .5  When using the ‘LIKE’ operator. For example.

9  The use of >= or <= can provide I/O advantages relative to > or <. The type of update depends primarily on the column(s) being updated. . especially when constructing SARGs against columns with low selectivity (a relatively small number of distinct values). revising the indexing structure on a table or the datatypes of updated columns can change a slower ‘deferred’ update to a speedier ‘direct’ update. o A SELECT query is ‘covered by an index’ when a composite index exists on the table that includes all the columns in the SELECT LIST. o Direct updates (including ‘in-place’. SPECIFIC T-SQL TIPS . fewer pages will be scanned.  User input or interaction should not be planned/expected within a ‘transaction block’.6 TRANSACTION BLOCKS:  Keep transactions short.7 UPDATES:  Updates are the most expensive SQL operations. ‘cheap direct updates’ and ‘expensive direct updates’) are usually cheaper.  Showplan will indicate whether a ‘direct’ or ‘deferred’ update is being executed. SPECIFIC T-SQL TIPS . o Modest sized high volume ‘projections’ can often be made to perform better by creating a ‘covering index’.  Be mindful of the objects that may be locked during a transaction block and what type of locks those might be. It is very important for transaction integrity when needed. A ‘transaction block’ performs more or more data modification statements (including delete.  Sometimes. o Deferred updates are usually more expensive. insert and update) between a ‘BEGIN TRANSACTION’ / ‘COMMIT’ or a ‘BEGIN TRANSACTION’ / ‘ROLLBACK’ statement pair.SPECIFIC T-SQL TIPS . In SARGs of the form col > ‘constant’. SPECIFIC T-SQL TIPS .  Use transaction blocking judiciously. If col >= ‘constant’ is used. This reduces the data sent back to the client and provides the possibility of ‘index covering’.  Updates come in two general flavors. but then sequentially scans pages until the next higher value is found. but potentially a performance bottleneck when used inappropriately. the datatype of the updated columns and the type of indexes available.8  Include only the columns needed in the SELECT LIST (as opposed to SELECT *). the index finds ‘constant’ quickly.

It can be part of a strategy for breaking large complex SQL into stepwise parts. SELECT INTO creates a new table (on the fly) based on the columns in the SELECT LIST and the restrictions in the predicate. DBs can not recover from transaction log dumps. Sometimes. When turned on.  Use SELECT INTO to quickly move a subset of data into a smaller table for further SQL processing. ‘SELECT INTO’ is enabled in most of our DSS environments (but not our transaction processing) environments. much faster than creating a table followed by an INSERT…SELECT statement. this situation can be improved as in the following example (from Paulsell’s P&T book): o Split this SP: CREATE PROCEDURE p AS DECLARE @x int SELECT @x = col FROM tab WHERE… SELECT * FROM tab2 WHERE indexed_col = @x [can’t be optimized] o Into these two SPs: o CREATE PROCEDURE base_proc AS DECLARE @x int SELECT @x = col WHERE … EXEC select_proc @x CREATE PROCEDURE select_proc @x int AS SELECT * FROM tab2 WHERE col2 = @x [can be optimized] SPECIFIC T-SQL TIPS .SPECIFIC T-SQL TIPS . However. the values of declared variables within SPs are not known when the SP runs.  SELECT INTO is minimally logged.10 PARAMETERS TO SPs:  A parameter to a SP (Stored Procedure) that gets used within the SP in the form <column> <operator> <@param> must be the same datatype as column in order to be useful as a SARG.  Parameters to SPs are known at execute/compile time.11 SELECT INTO:  The SELECT INTO operation (SELECT * INTO new_table FROM) is very fast.  ‘SELECT INTO’ can populate either “#’ temporary tables (which last only for the current session) or regular tables. .

12  Rewrite SQL to use ‘EXISTS’ and ‘IN’ in subqueries and IF statements instead of ‘NOT EXISTS’ and ‘NOT IN’. the optimizer uses a special ‘OR STRATEGY’ or multiple matching index scans.). SPECIFIC T-SQL TIPS .  Avoid long OR lists or long IN (val1. valn) lists since all pages will be locked for the duration of the statement execution.15 USING OR WITH JOINS: . a table scan must be used unless ALL of the columns are indexed and ALL of the SARGs are properly formed. In cases where the table must be scanned because there are no appropriate SARGs or indexes. The ‘OR STRATEGY’ entails creation of a special sorted worktable. but must read all values for the negations. o col1 = <val> OR col2 = <val>  In the 2nd form of OR. SPECIFIC T-SQL TIPS .. This is usually done something like the following example: o IF EXISTS (SELECT 1 from sysobjects WHERE name = “SLAcct6” AND type = “U”) THEN…do something.SPECIFIC T-SQL TIPS . If indexes can be used on all the columns in the OR. Instead perform an existence check: SELECT * FROM table WHERE EXISTS (SELECT 1 FROM table2 WHERE…).14 USING OR WITH SARGS:  Use of OR between SARGs or join clauses can be expensive.  SARGs can be combined with OR in two ways: o col1 = <val1> OR col1 = <val2>..  Don’t use the COUNT aggregate to perform an existence check as in: SELECT * FROM table WHERE 0 < (SELECT COUNT(*) from table2 where.13 EXISTENCE CHECKS:  It is a good practice to check for the existence of an object in a SQL batch or SP before performing SQL work against the object. val2.  Using ‘SELECT 1’ in existence subqueries is better than ‘SELECT *’ since it may result in less locking of system tables. Sybase can return TRUE as soon as a single row matches for ‘EXISTS’ and ‘IN’. Use them only if really needed. ‘IN’ clauses are always reduced to this. The COUNT may cause a table scan or index scan. SPECIFIC T-SQL TIPS .

d  The separate SELECTs in the UNION can each be optimized.a = t2.c = t2.d) can NOT be optimized and may cause Cartesian cross-products. o A ‘GROUP BY’ clause is used. Review the following example: SELECT col FROM t1 WHERE keycol in (SELECT col FROM t2) Can sometimes be rewritten as: . t2 WHERE t1.b OR t1. Be attentive to the treatment of duplicates. t2 WHERE t1.c = t2. o The column is not the first column of in index.  Join statements combined with OR (like t1.17 SUBQUERIES:  Some queries with subqueries can be rewritten as joins with better performance. MAX(SL10_SLAcct10_Num) FROM SLAcct10 Because more than one aggregate is being used.a = t2. t2 WHERE t1.b OR t1.  The following query can NOT use the special MIN/MAX optimizations: SELECT MIN(SL10_SLAcct10_Num).  Example: SELECT * FROM t1. These optimizations can’t be used if: o The column is part of an expression or function. o There is another aggregate in the query.b UNION <or UNION ALL> SELECT * FROM t1.a = t2. Try rewriting the statement to use a UNION. Split the query into a separate MIN and MAX query.16 MIN AND MAX AGGREGATES:  Special optimizations apply to the ‘MIN’ and ‘MAX’ aggregates applied to columns.c = t2. SPECIFIC T-SQL TIPS .d REWRITE AS: SELECT * FROM t1. SPECIFIC T-SQL TIPS .

18 BOOLEAN EXPRESSIONS:  When ‘ANDing’ Boolean expressions (like @variable = “Lower Interest Rates”). does NOT eliminate ‘loop invariants’. insert. if you have code within a ‘WHILE’ loop inside a Stored Procedure that performs a static calculation producing a result that doesn’t change for each execution of the loop. . This saves time in evaluating the others.  Don’t perform invariant static computations inside a SQL ‘WHILE’ loop or the server will recalculate the computation for each loop execution. That is. put the expression MOST LIKELY TO FAIL first. A Stored Procedure is precompiled . can help with security (data access) issues.  These considerations are most likely to be important in the context of conditional testing within SQL WHILE loops. delete) make sure ‘UPDATE STATS’ is run for the table. the T-SQL optimizer will not move the calculation outside the loop. t2 WHERE t1. SELECT col FROM t1. SPECIFIC T-SQL TIPS .  Stored Procedures reduce network traffic.  When ‘ORing’ Boolean expressions. can be available to all clients and have other advantages.the query tree is prepared when the procedure is first used and available for use thereafter. can be parameterized).21 A PLUG FOR UPDATE STATS:  Accurate density and distribution statistics are essential for accurate optimization of queries.col  Review issues of uniqueness and dups when ‘flattening’ subqueries into the main query. put the expression MOST LIKELY TO SUCCEED first.  If substantial modification has been made to a table (update. unlike more complete programming languages with advanced optimizing compilers (like ‘C’).20 A PLUG FOR SPs:  Create Stored Procedures for queries which will be used repeatedly with only slight variation (that is. SPECIFIC T-SQL TIPS .19  T-SQL. SPECIFIC T-SQL TIPS . SPECIFIC T-SQL TIPS .keycol = t2.

..SQL TOPICS FOR ANOTHER DAY.  A detailed review of index types and their characteristics. UNION. COALESCE. .  Pivoting and crosstabs with SQL. THAT’S ALL FOLKS!  Thanks for your Time!  Bring your SQL questions to Don or Jerome or any of the DBAs for help and insights.  Hardwiring query plans.  A detailed review of ‘showplan’ and ‘trace’ output  Nuances of correlated and non-correlated subqueries and subquery performance issues.  Tips to proper use of aggregates with ‘GROUP BY’ and ‘HAVING’ clauses.  Nuances of creating objects with Stored Procedures.  Infrequently used SQL statements: CASE.  Working with row-wise maximums and minimums.