Writing Good SQL

WRITING GOOD SQL STRATEGIES, TIPS & TRICKS
OVERVIEW
Introductory remarks - SQL, T-SQL, versions

Places to go for more information
What you need to know to write good SQL
Working Method
Overview of query batch processing
Overview of Optimizer
Using showplan and traceon/traceoff
General problem scenarios
Specific SQL Tips
Demos, examples
CONTESTS WITH PRIZES!
SQL WHAT?
SQL = Structured Query Language

Developed in IBMs San Jose research lab 1970s
Now ANSI and ISO standard for RDBMS SQL
o ANSI = American National Standards Institute
o ISO = International Organization for Standardization
SQL86=>SQL89=>SQL92
Entry, Intermediate and Full compliance
Embedded, Native, Dynamic
DDL (Data Definition Language), DML (Data Modification), SAL (System Administration
Language) and QL (Query Language - what well mostly cover)
Use Sybase SET options for Entry SQL92 compliance
Focus today on Transact-SQL not PL/SQL
INFORMATION SOURCES ON SQL - SYBOOKS (in-house)
Sybase ASE Release 11.5 Collection::Transact-SQL Users Guide

Sybase ASE Release 11.5 Collection:: Sybase ASE Performance & Tuning Guide
Chapter 11
INFORMATION SOURCES ON SQL - BOOKS (in-house)
A Guide to the SQL Standard by C. J. Date (4/97 avail)

Sybase SQL Server Performance and Tuning Guide by Karen Paulsell, chapter 10
(similar content in SYBOOKS)
Sybase SQL Server Performance Tuning by Shaibal Roy and Marc B. Sugiyama, pages
194-205
Optimizing Transact-SQL: Advanced Programming Techniques by Rozenshtein,
Abramovich & Birger
The Database Experts Guide to SQL by Frank Lusardi (now getting a little dated but
still useful)
INFORMATION SOURCES ON SQL - ARTICLES (in-house)
The Essence of SQL: A Guide to Learning Most of SQK in the Least Amount of Time by
David Rozenshtein in SQL Forum Vol 4 No 4, whole issue (later a book) [Excellent
short introduction to SQL]
Performance Tips for Transact-SQL, slides from a presentation by Jeff Lichtman
Subquery Processing Performance Improvements, slides from a presentation by Jeff
Lichtman
INFORMATION SOURCES ON SQL - Web (accessible)
http://www.sybase.com [main Sybase site]

http://www.isug.com [International Sybase Users Group]
http://www.edbarlow.com [Ed Barlows Sybase stuff site]
http://uccbt.uchicago.edu/cbtweb/english/cbtweb/index.htm => SQL
Programming: Database Queries (4 hr)
http://uccbt.uchicago.edu/cbtweb/english/cbtweb/index.htm => Oracle
Introduction, several sections
http://www.oracle.com [Main Oracle site]
http://technet.oracle.com [Oracle Technology Network - white papers, tech info]
http://www.ioug.org [International Oracle Users Group. Tech stuff, product info, white
papers, code snips, IOUG master class training et cetera]
http://www.ixora.com.au [Ixora - Oracle on Unix, internals. An australian site. Steve
Adams' site - expert on Oracle internals]
http://www.oracle.com/education/oln [Oracle's online Web training site]
INFORMATION SOURCES ON SQL - NEW BOOKS
Oracle PL/SQL Tips & Techniques (Oracle Series) by Joseph C. Trezzo (the definitive
'Tips' book on PL/SQL)
Oracle8 PL/SQL Programming by Urman and Rinaldi
1001 SQL Tips by Konrad King (due out in April 2001)
SQL Queries for Mere Mortals: A Hands-On Guide to Data Manipulation in SQL by M. J.
Hernandez
SQL for Smarties: Advanced SQL Programming by Joe Celko
SQL: The Complete Reference (Osborne's complete reference series) by Groff and
Weinberg
The Guru's Guide to Transact-SQL by Kenneth W. Henderson
WHAT DO YOU NEED TO KNOW TO WRITE GOOD SQL?
YOU NEED TO KNOW A LOT:
Understand the data objects youll use (esp. tables, views, indexes, keys)
Understand the relationships between the data objects youll use (esp. join
opportunities, cardinality)
Understand the meaning of the data youll use (meaning, values, NULLs, missing
data, uniqueness)
Understand how much data you are querying or generating
Understand the SQL syntax
Understand the report requirements or the SQL processing that is needed
Understand how the SQL optimizer works
Understand how your SQL articulates with with a wrapping language if present
HOW TO WRITE SQL - GOOD WORKING METHODS
Proceed incrementally, iteratively and empirically

Have some idea what to expecthow long it should take...what work you are asking
the server to accomplish.
Use progressively more complex prototypical SQL, examining results after each
query, until the desired result
Break down complex SQL into simple manageable statements
Use temp tables (# or t_) to stage data between query steps if necessary. Index
temp tables when needed. [Most of the time!]
Review query results carefully to make sure they are what you expect. Pay special
attention to issues related to: NULLs, missing data, dups and any anomalies.
The way NOT to write new SQL is to launch large complex integrated queries
against poorly understood objects of unknown size, indexing, key structure
et cetera.
OVERVIEW OF T-SQL BATCH PROCESSING
PARSE: Read the SQL supplied by the client and create an internal data structure
called a parse tree. Perform syntax checks.
NORMALIZE: The parse tree is reorganized for greater efficiency. Existence checks
and permissions checks are performed.
COMPILE: Resolve views, flatten subqueries, process constraints; optimize queries and
form query plan.
EXECUTE: Run query plan and send results to client.
OVERVIEW OF T-SQL COMPILATION
PHASES OF OPTIMIZER:
QUERY ANALYSIS
o Find SEARCH clauses
o Find OR clauses
o Find JOIN clauses
INDEX SELECTION
o Select best index for each clause
JOIN SELECTION
o Determine JOIN ORDER
o Estimate costs and pick best plan
Cost based versus Rule based optimizers
USING SHOWPLAN AND TRACEON TO FOLLOW THE SYBASE OPTIMIZER
set statistics time on/off [shows query timing in milliseconds]

set statistics io on/off [shows quantity of I/O in arbitrary units]
set showplan on/off [shows the query plan]
set noexec on/off [shuts off query execution; use last otherwise other commands
won't work!]
dbcc traceon/traceoff(3604) [sends trace output to current session]
dbcc traceon/traceoff(310) [ info on join selection]
dbcc traceon/traceoff(302) [info on index selection]
The appendices of Client/Server Data Design with Sybase by G. Anderson has long
lists of dbcc and trace commands. May not be supportedtest in development!
COMMON SQL PROBLEMS - 1
NOT USING AN INDEX:
In most cases, an index should be used for each table in the query. Generally, when
an index isnt used the entire table is scanned. This is bad :>(
Know what tables are indexed and how. Understand how/when indexes are used.
[More slides on this.]
Understand how certain predicate constructions prevent use of an index. [More slides
on this.]
Use showplan to confirm expectations about index usage.
If an apparently obvious index was not usedunderstand why.
If you think youre query should be using an index and showplan indicates that it is
not, ask a DBA to review the problem.
QUERY IS TOO COMPLEX:
Avoid joining too many tables.

o Much depends on the indexes used and the efficiency of those indexes.
o Max is about 6-7 tables on IRF2, but only if joined properly on narrow
clustered index keys.
o Avoid joining more than 3 really big tables (> 10^6 rows)
Avoid excessively complex predicate
o It is easy to write predicates that prevent proper index usage. Avoid this.
[More slides on this.]
o Avoid combining two or more special predicate statements like GROUP BY
with an aggregate and/or SORT and/or COMPUTE BY and/or HAVING et
cetera.
o Avoid multiple OR operators in predicates.
o Avoid more than one subquery in a predicate.
UNREALISTIC EXPECTATIONS:
Some properly formed queries legitimately ask the RDBMS server to do a lot of work
and may take time to execute.
Know the size of the objects in the query and try to understand how much work is
being requested.
Start queries off more simply with fewer tables and/or more simple or more restrictive
predicates to develop a performance baseline.
UNEXPECTED RESULTS - POSSIBLE CAUSES:
NULLs or missing data.

Duplicate rows.
Missing keys.
o Keys may be absent due to missing data.
o Keys may be absent due to synchronization (timing) issues between different
domains of data.
o Keys may be absent due to matching (incompatibility) issues between
different domains of data.
Equijoin between tables A and B fails to return a join row because while table A has a
qualifying row and table B is EXPECTED to have a qualifying rowbut it does not. That
is, relationship is 1 to 0/1/N.
Improper use of aggregate with GROUP BY.
Use or misuse of DISTINCT keyword.
Sorting behavior varies according to datatype.
USE OF CURSORS:
When using cursors, acquire a large stock of garlic, crucifixes and wooden stakes.
Use cursors ONLY when absolutely necessary. There are always unpredictable
performance consequences to the use of cursors.
Never use cursors when set SQL will suffice.
If cursors must be used, be attentive to transaction blocking issues.
Some cursors operations require specific types of indexes to support them (like unique
or clustered).
Keep it simplenever use cursors.
WHAT ARE SARGs?
A SARG (Search ARGument) is a WHERE clause specification that can be used to

determine a path to data by matching an index key and determining the use of an
index.
SARGs have this form:
o <column> <operator> <expression>
o <expression> <operator> <column>
o <column> is null
IMPORTANT: column must only be a column name. It can NOT be used in a
function, expression or concatenation to be used as a SARG!
Permitted operators:
o =, >, <, >=, <=, !>, !<, <>, !=, is null
If expression is a constant -> can use distribution vals in index statistics
If expression is evaluated at runtime -> only use density from distribution page
Non-equality operators special case -> can use covering non-clustered index for non-
matching index scan
SARG EQUIVALENTS
Some things that can be converted to SARGs:
BETWEEN statement converted to >= and <= clause

LIKE statement converted to > or < queries using alpha sort order. The statement
LIKE Univ% is a SARG. The statement LIKE %Dept is not.
SARG EXAMPLES
Valid SARG examples (I.e. uses an index if available):

o production = RED BLACK & BOO (col = constant)
o production like RED% [becomes: >= RED and < REE]
o ((admission > 8.00) AND (admission < 12.00))
o salary = 12 * $100000.00 [uses density, but not distribution]
o salary = @yearly_sal [uses density, but not distribution]
The following are NOT SARGs (I.e. will not use index):
o substring(production, 1, 3) = RED [col used in function!]
o convert(integer, acct6) = 20354 [col used in function]
o production like %BOO [must lead like string with a char]
o salary * 12 = $1200000.00 [col used in expression]
o A.last_salary = A.cur_sal [col = col not valid SARG]
JOINS - INDEX USAGE
There should always be indexes on keys and the optimizer will usually use them for
joins between tables on their primary or foreign keys.
There are caveats:
o The datatypes of columns being joined MUST BE THE SAME. If a column
must be converted (even implicitly in some cases) to process the join, an
index on that column can not be used.
o Functions, expressions or concatenations used against a join column will
prevent that column from being used as an index.
In cases where one of the columns must be converted in a join, try moving the
datatype conversion (the CONVERT statement) to the join column associated with the
SMALLER of the two tables. That is, force the scan to the smaller table.
JOIN ORDER
Join order can make a big difference in query performance. Info regarding join order
for a query can be obtained through showplan and the 310 trace.
If there are more than 4 tables in a join, the Sybase optimizer may not determine the
best join order. In these cases, the optimizer determines the best join order by
costing 4 table subsets and progressively selects the outermost join tables.
A Sybase SET option can increase the number of tables considered when costing
joins - up to 8 tables. Optimization time may increase significantly. This SET option
is rarely needed and should be used with care.
In rare cases when needed, join order can be hard wired with SET forceplan on. This
is rarely needed and should be used with care.
SPECIFIC T-SQL TIPS - 1
The optimizer wont use a composite index UNLESS you have a valid SARG against
(at least) the first component of the composite index.
o NOTE: A composite index is a single index constructed across more than one
table column.
EXAMPLE:
o CREATE index Acct_Ind on Acct (acct, sub, l)
o Select * from Acct where sub = 5400 [This query will NOT use the
composite index Acct_Ind.]
The use of functions, expressions or concatenation on a column (in a predicate) will

prevent use of an index on that column.
EXAMPLE:
o select * from SLAcct10 where SL10_SLAcct10_Num = '4272036300 [Uses
index on SL10_SLAcct10_Num]
o select * from SLAcct10 where ltrim(SL10_SLAcct10_Num) = '4272036300'
[Will NOT use index on SL10_SLAcct10_Num]
Showplan demo on queries
Provide as much valid information in WHERE clauses as possible for the optimizer to
consider:
o Provide all possible joins for the optimizer to review. For example, when
joining three tables A, B, and C; specify the join from A to B, the join from B
to C AND the join from A to C if valid.
o Supply as many valid SARGs as possible. In particular, it may be useful to
provide redundant SARGs for the same column present in each of two joined
tables.
Avoid datatype mismatches between join columns or across the <operator> in SARGs.
An index will often not be used in these cases.
When using the LIKE operator, make sure the wildcard string starts with at least one
character before the wildcard character.
o The statement: president LIKE %linton will NOT use an index.
TRANSACTION BLOCKS:
Keep transactions short.

User input or interaction should not be planned/expected within a transaction block. A
transaction block performs more or more data modification statements (including
delete, insert and update) between a BEGIN TRANSACTION / COMMIT or a BEGIN
TRANSACTION / ROLLBACK statement pair.
Be mindful of the objects that may be locked during a transaction block and what type
of locks those might be.
Use transaction blocking judiciously. It is very important for transaction integrity when
needed, but potentially a performance bottleneck when used inappropriately.
UPDATES:
Updates are the most expensive SQL operations.

Updates come in two general flavors. The type of update depends primarily on the
column(s) being updated, the datatype of the updated columns and the type of
indexes available.
o Direct updates (including in-place, cheap direct updates and expensive
direct updates) are usually cheaper.
o Deferred updates are usually more expensive.
Showplan will indicate whether a direct or deferred update is being executed.
Sometimes, revising the indexing structure on a table or the datatypes of updated
columns can change a slower deferred update to a speedier direct update.
Include only the columns needed in the SELECT LIST (as opposed to SELECT *). This
reduces the data sent back to the client and provides the possibility of index
covering.
o A SELECT query is covered by an index when a composite index exists on the
table that includes all the columns in the SELECT LIST.
o Modest sized high volume projections can often be made to perform better by
creating a covering index.
The use of >= or <= can provide I/O advantages relative to > or <, especially
when constructing SARGs against columns with low selectivity (a relatively small
number of distinct values). In SARGs of the form col > constant, the index finds
constant quickly, but then sequentially scans pages until the next higher value is
found. If col >= constant is used, fewer pages will be scanned.
PARAMETERS TO SPs:
A parameter to a SP (Stored Procedure) that gets used within the SP in the form
<column> <operator> <@param> must be the same datatype as column in order to
be useful as a SARG.
Parameters to SPs are known at execute/compile time. However, the values of
declared variables within SPs are not known when the SP runs. Sometimes, this
situation can be improved as in the following example (from Paulsells P&T book):
o Split this SP:
CREATE PROCEDURE p AS
DECLARE @x int
SELECT @x = col FROM tab WHERE
SELECT * FROM tab2 WHERE indexed_col = @x [cant be optimized]
o Into these two SPs:

o CREATE PROCEDURE base_proc AS
DECLARE @x int
SELECT @x = col WHERE
EXEC select_proc @x
CREATE PROCEDURE select_proc @x int AS
SELECT * FROM tab2 WHERE col2 = @x [can be optimized]
SELECT INTO:
The SELECT INTO operation (SELECT * INTO new_table FROM) is very fast, much
faster than creating a table followed by an INSERTSELECT statement. SELECT INTO
creates a new table (on the fly) based on the columns in the SELECT LIST and the
restrictions in the predicate.
SELECT INTO is minimally logged. When turned on, DBs can not recover from
transaction log dumps. SELECT INTO is enabled in most of our DSS environments
(but not our transaction processing) environments.
SELECT INTO can populate either # temporary tables (which last only for the
current session) or regular tables.
Use SELECT INTO to quickly move a subset of data into a smaller table for
further SQL processing. It can be part of a strategy for breaking large
complex SQL into stepwise parts.
Rewrite SQL to use EXISTS and IN in subqueries and IF statements instead of NOT
EXISTS and NOT IN. In cases where the table must be scanned because there are no
appropriate SARGs or indexes, Sybase can return TRUE as soon as a single row
matches for EXISTS and IN, but must read all values for the negations.
EXISTENCE CHECKS:
It is a good practice to check for the existence of an object in a SQL batch or SP

before performing SQL work against the object. This is usually done something like the
following example:
o IF EXISTS (SELECT 1 from sysobjects
WHERE name = SLAcct6 AND type = U)
THENdo something.
Dont use the COUNT aggregate to perform an existence check as in: SELECT * FROM
table WHERE 0 < (SELECT COUNT(*) from table2 where...). Instead perform an
existence check: SELECT * FROM table WHERE EXISTS (SELECT 1 FROM table2
WHERE). The COUNT may cause a table scan or index scan.
Using SELECT 1 in existence subqueries is better than SELECT * since it may result
in less locking of system tables.
USING OR WITH SARGS:
Use of OR between SARGs or join clauses can be expensive. Use them only if really
needed.
SARGs can be combined with OR in two ways:
o col1 = <val1> OR col1 = <val2>. IN clauses are always reduced to this.
o col1 = <val> OR col2 = <val>
In the 2nd form of OR, a table scan must be used unless ALL of the columns
are indexed and ALL of the SARGs are properly formed. If indexes can be used
on all the columns in the OR, the optimizer uses a special OR STRATEGY or multiple
matching index scans. The OR STRATEGY entails creation of a special sorted
worktable.
Avoid long OR lists or long IN (val1, val2, valn) lists since all pages will be locked for
the duration of the statement execution.
USING OR WITH JOINS:

Join statements combined with OR (like t1.a = t2.b OR t1.c = t2.d) can NOT
be optimized and may cause Cartesian cross-products. Try rewriting the
statement to use a UNION. Be attentive to the treatment of duplicates.
Example:
SELECT * FROM t1, t2
WHERE t1.a = t2.b OR t1.c = t2.d
REWRITE AS:
WHERE t1.a = t2.b
UNION <or UNION ALL>
WHERE t1.c = t2.d
The separate SELECTs in the UNION can each be optimized.
MIN AND MAX AGGREGATES:
Special optimizations apply to the MIN and MAX aggregates applied to columns.
These optimizations cant be used if:
o The column is part of an expression or function.
o There is another aggregate in the query.
o The column is not the first column of in index.
o A GROUP BY clause is used.
The following query can NOT use the special MIN/MAX optimizations:
SELECT MIN(SL10_SLAcct10_Num), MAX(SL10_SLAcct10_Num) FROM SLAcct10
Because more than one aggregate is being used. Split the query into a separate MIN and
MAX query.
SUBQUERIES:
Some queries with subqueries can be rewritten as joins with better performance.
Review the following example:
SELECT col FROM t1
WHERE keycol in (SELECT col FROM t2)
Can sometimes be rewritten as:

SELECT col FROM t1, t2
WHERE t1.keycol = t2.col
Review issues of uniqueness and dups when flattening subqueries into the main
query.
BOOLEAN EXPRESSIONS:
When ANDing Boolean expressions (like @variable = Lower Interest Rates), put the
expression MOST LIKELY TO FAIL first. This saves time in evaluating the others.
When ORing Boolean expressions, put the expression MOST LIKELY TO SUCCEED
first.
These considerations are most likely to be important in the context of conditional
testing within SQL WHILE loops.
T-SQL, unlike more complete programming languages with advanced optimizing

compilers (like C), does NOT eliminate loop invariants. That is, if you have code
within a WHILE loop inside a Stored Procedure that performs a static calculation
producing a result that doesnt change for each execution of the loop, the T-SQL
optimizer will not move the calculation outside the loop.
Dont perform invariant static computations inside a SQL WHILE loop or the server
will recalculate the computation for each loop execution.
A PLUG FOR SPs:
Create Stored Procedures for queries which will be used repeatedly with only slight
variation (that is, can be parameterized). A Stored Procedure is precompiled - the
query tree is prepared when the procedure is first used and available for use
thereafter.
Stored Procedures reduce network traffic, can help with security (data access) issues,
can be available to all clients and have other advantages.
A PLUG FOR UPDATE STATS:
Accurate density and distribution statistics are essential for accurate optimization of
queries.
If substantial modification has been made to a table (update, insert, delete) make sure
UPDATE STATS is run for the table.
SQL TOPICS FOR ANOTHER DAY...
Tips to proper use of aggregates with GROUP BY and HAVING clauses.

Pivoting and crosstabs with SQL.
Working with row-wise maximums and minimums.
Infrequently used SQL statements: CASE, COALESCE, UNION.
Hardwiring query plans.
A detailed review of index types and their characteristics.
A detailed review of showplan and trace output
Nuances of correlated and non-correlated subqueries and subquery performance
issues.
Nuances of creating objects with Stored Procedures.
THATS ALL FOLKS!
Thanks for your Time!

Bring your SQL questions to Don or Jerome or any of the DBAs for help and insights.

Writing Good SQL

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Writing Good SQL

Uploaded by

Copyright:

Available Formats

WRITING GOOD SQL STRATEGIES, TIPS & TRICKS

Introductory remarks - SQL, T-SQL, versions

SQL = Structured Query Language

INFORMATION SOURCES ON SQL - SYBOOKS (in-house)

Sybase ASE Release 11.5 Collection::Transact-SQL Users Guide

INFORMATION SOURCES ON SQL - BOOKS (in-house)

A Guide to the SQL Standard by C. J. Date (4/97 avail)

INFORMATION SOURCES ON SQL - Web (accessible)

http://www.sybase.com [main Sybase site]

INFORMATION SOURCES ON SQL - NEW BOOKS

WHAT DO YOU NEED TO KNOW TO WRITE GOOD SQL?

YOU NEED TO KNOW A LOT:

Proceed incrementally, iteratively and empirically

OVERVIEW OF T-SQL BATCH PROCESSING

OVERVIEW OF T-SQL COMPILATION

Cost based versus Rule based optimizers

USING SHOWPLAN AND TRACEON TO FOLLOW THE SYBASE OPTIMIZER

set statistics time on/off [shows query timing in milliseconds]

COMMON SQL PROBLEMS - 1

NOT USING AN INDEX:

COMMON SQL PROBLEMS - 2

QUERY IS TOO COMPLEX:

Avoid joining too many tables.

COMMON SQL PROBLEMS - 3

UNEXPECTED RESULTS - POSSIBLE CAUSES:

NULLs or missing data.

COMMON SQL PROBLEMS - 5

WHAT ARE SARGs?

A SARG (Search ARGument) is a WHERE clause specification that can be used to

Some things that can be converted to SARGs:

BETWEEN statement converted to >= and <= clause

Valid SARG examples (I.e. uses an index if available):

JOINS - INDEX USAGE

SPECIFIC T-SQL TIPS - 2

The use of functions, expressions or concatenation on a column (in a predicate) will

SPECIFIC T-SQL TIPS - 3

SPECIFIC T-SQL TIPS - 4

SPECIFIC T-SQL TIPS - 5

Keep transactions short.

SPECIFIC T-SQL TIPS - 7

Updates are the most expensive SQL operations.

SPECIFIC T-SQL TIPS - 8

SPECIFIC T-SQL TIPS - 9

SELECT @x = col FROM tab WHERE

SELECT * FROM tab2 WHERE indexed_col = @x [cant be optimized]

o Into these two SPs:

SELECT @x = col WHERE

CREATE PROCEDURE select_proc @x int AS

SELECT * FROM tab2 WHERE col2 = @x [can be optimized]

SPECIFIC T-SQL TIPS - 11

SPECIFIC T-SQL TIPS - 13

It is a good practice to check for the existence of an object in a SQL batch or SP

WHERE name = SLAcct6 AND type = U)

SPECIFIC T-SQL TIPS - 14

USING OR WITH SARGS:

SPECIFIC T-SQL TIPS - 15

USING OR WITH JOINS:

SELECT * FROM t1, t2

WHERE t1.a = t2.b OR t1.c = t2.d

SELECT * FROM t1, t2

WHERE t1.a = t2.b

UNION <or UNION ALL>

SELECT * FROM t1, t2

WHERE t1.c = t2.d