Professional Documents
Culture Documents
Q. How can I create a plain-text flat file from SQL Server as input to another
application?
More important, by using a view, you can export data from multiple joined tables.
The only thing you cannot do is specify the sequence in which the rows are written to
the flat file, because a view does not let you include an ORDER BY clause in it unless
you also use the TOP keyword.
For example, you can use bcp to generate from the pubs database a list of authors
who reside in California by writing the following code:
1
USE master
GO
GO
The sp_GetBlockInfo procedure tells you the lock mode, the database and object
names of the locked resource, and in the case of a blocking chain, which SPID is the
root blocker. If the process is not blocked,
sp_GetBlockInfo returns an empty recordset.
You can also detect blocks by checking for error 1222, "Lock request time out
period exceeded." The LOCK_TIMEOUT setting controls how long a process will wait
for locks to be released before timing out. When the lock timeout occurs, SQL Server
sends error 1222 to the application. In SQL Server 7.0, this error aborts the
statement but does not cause the batch to roll back, so you can look for the
Transact-SQL system variable @@ERROR and determine where locks exist
2
Q. Can you create UNIQUE and PRIMARY KEY constraints on computed columns in
SQL Server 2000?
A. In SQL Server, the physical mechanism that UNIQUE and PRIMARY KEY constraints
use to enforce uniqueness is a unique index. Because SQL Server 2000 supports
indexes on computed columns,
you can create UNIQUE and PRIMARY KEY constraints on computed columns.
CREATE TABLE T1 (
col1 int NOT NULL,
col2 AS col1 + 1 UNIQUE
)
CREATE TABLE T2 (
col1 int NOT NULL,
col2 AS col1 + 1 PRIMARY KEY
)
Because of the primary key constraint, SQL Server requires you to guarantee that
your computation's
result will not be NULL. The computation in the computed column can overflow
(for example, when you add 1 to the largest integer) or underflow (when you
subtract 1 from the smallest integer), and other computations can result in a divide-
by-zero error. However, if the ARITHABORT
(which determines whether a query has ended when an overflow or a divide-by-zero
error occurs)
and ANSI_WARNINGS (which specifies ANSI SQL-92 standard behavior for several
error conditions)
session settings are off, instead of ending the query, the computation can have a
NULL result.
CREATE TABLE T2 (
col1 int NOT NULL,
3
col2 AS ISNULL(col1 + 1, 0) PRIMARY KEY
)
A. Recompilations might be the source of the slower stored procedure speed. To find
out for sure, you need to do some performance investigation, such as looking at
Showplans for each type of query versus calling the stored procedures and
comparing query plan cache hits to cache misses. You can also try coding the object
owner for referenced tables, views, and procedures inside your stored procedures, as
the following example shows:
This technique helps you reuse plans and prevent cache misses.
A. SQL Server excludes an ORDER BY clause from a view to comply with the ANSI
SQL-92 standard. Because analyzing the rationale for this standard requires a
discussion of the underlying structure of the structured query language (SQL) and
the mathematics upon which it is based, we can't fully explain the restriction here.
However, if you need to be able to specify an ORDER BY clause in a view, consider
using the following workaround:
USE pubs
GO
The TOP construct, which Microsoft introduced in SQL Server 7.0, is most useful
when you combine it with the ORDER BY clause. The only time that SQL Server
supports an ORDER BY clause in a view is when it is used in conjunction with the TOP
keyword.
Note that the TOP keyword is a SQL Server extension to the ANSI SQL-92 standard.
4
Q. Is using the TOP N clause faster than using SET ROWCOUNT N to return a specific
number of rows from a query?
A. With proper indexes, the TOP N clause and SET ROWCOUNT N statement are
equally fast, but with unsorted input from a heap, TOP N is faster. With unsorted
input, the TOP N operator uses a small internal sorted temporary table in which it
replaces only the last row. If the input is nearly sorted, the TOP N engine must delete
or insert the last row only a few times. Nearly sorted means you're dealing with a
heap with ordered inserts for the initial population and without many updates,
deletes, forwarding pointers, and so on afterward. A nearly sorted heap is more
efficient to sort than sorting a huge table. In a test that used TOP N to sort a table
with the same number of rows but with unordered inserts, TOP N was not as efficient
anymore. Usually, the I/O time is the same both with an index and without; however,
without an index SQL Server must do a complete table scan. Processor time and
elapsed time show the efficiency of the nearly sorted heap. The I/O time is the same
because SQL Server must read all the rows either way.
Q. I have two tables t1 and t2 both with the columns a1, a2. I want to find
the difference of (the set of t1) - (the set of t2) without using the keyword
EXCEPT because MSSQL 2000 does not recognize that word. I have tried this
query but it does not give me what I want: SELECT * FROM t1 WHERE NOT
EXISTS (SELECT t1.* FROM t1 INNER JOIN t2 ON t1.a1=t2.a1 AND
t1.a2=t2.a2)
This is the solution. The difference of (the set of t1) - (the set of t2) is
According to SQL Query Analyzer, this is slightly more efficient than the left join
(possibly only because
of the tables I tested it with):
I have a form and I am passing a value to a stored procedure. The value has
leading zeros. When the values are passed the leading zeros are dropped,
thus causing my stored procedure to blow up. Is there a way to maintain
those zeros in passing or pick them up again in the procedure?
5
If you were passing a string value then the LZ should not be dropped, so I suspect
that you are passing a
numeric value then converting it back to a varchar which will drop any LZ. If you
don't want to change the interface then you can always restore the LZ in the stored
procedure by using the following.
A, By default, SQL Server 7.0 installation sets up SQL Server to run in a case
insensitive mode. For most applications this may work great, but there are certain
situations were case sensitive searches are required. For instance, if a web site
needs to have passwords that are case sensitive a method needs to be devised to
In our test we are searching for a ‘bUSY’ value in the msg column of the test table.
So the syntax of the same query, if the SQL Server was set to be case sensitive,
would be:
6
This query will return all rows in the test table. Now, here is the script that will
perform the case sensitive search.
EXEC( @sql )
This returns the name of the current database, the owner, the table name, and the
table type for each table in the database. It's possible to query the system tables
directly, but if this gives the information you need,
it's better to use the existing views that come with SQL Server.
Answer: Lets break down your question into several steps. First, let's create a
sample table using the following code:
7
(
i int
)
go
declare @i int
@i = 0
Now, let's find rows that are duplicates. For that we can use a simple group by
statement:
i num_records
----------- -----------
0 2
5 2
18 2
22 2
27 2
31 2
34 2
44 2
49 2
This identifies the rows that have duplicates. But it does not return the total number
of duplicates in the table. The first change we must make is to recognize the above
rows that show 2 contain only one duplicate.
So we want a query that basically sums up the duplicates from the above query. To
do so, we take the previous query and can put that in the from statement as a
derived table. We then can use the sum function to create the total for us:
select sum(num_dups)
from (select i,count(*)-1 as num_dups
from dups
group by i
having count(*)-1 > 0) as mydups
SQL Sever 7 allows developers to execute commands against OLE DB data sources
on different servers. In order to execute commands on remote servers, the SQL
8
Server instance where the commands will be issued must be setup properly. This
entails adding the remote server to SQL Server's linked server list. Do this by using
the sp_addlinkedserver command.
For example, to link a remote SQL Server database that resides on the
RemoteDBServer server, you would usethe following syntax:
Note that only members of the sysadmin role can set this server option. Once the
remote database has been linked, queries can be executed against it as long as the
remote tables are prefaced using Server.Database.Table Owner.Table Name.
For example, the following query would return all rows in the authors table of our
RemoteDBServer SQL Server database:
Question: Can I use a variable in a query with the IN clause (a,b,c..z), without
getting quotes or conversion errors?
Answer: You can use a variable as your IN clause, but this requires that you use the
EXEC function to run the statement.
Ever wanted to delete files from the local machine that your SQL Server database is
running?
You can do it using the extended stored procedure xp_cmdshell like this:
But this requires the sysadmin option on the SQL Server and Admin role from NT
server. In most instances it is not preferable to give these privileges. So to delete
files without requiring this access use the built-in SQL Server Automation APIs and
the FileSystemObject:
9
Large Text Fields
Question: How do I create a text field of greater than 8,000 characters (in
v7.0)? I attempted to use the "text" data type, but my code returned an
error saying the maximum size was 8,000. Here's the code:
CREATE TABLE X ( X_ID int IDENTITY(1,1), X_DESC text (60000) NOT NULL ) GO
Answer: SQL Server is returning a bogus error message. The real error has to do
with your syntax.
When specifying text you don't specify a size.
You can see the real error message if you reduce the number 60000 to 5. Then you
will get this message:
Server: Msg 2716, Level 16, State 1, Line 1
Column or parameter #2: Cannot specify a column width on data type text. Instead,
simply specify it as text without the parentheses and the number. The actual size of
the storage used for the text field will depend on how much data you actually put in
the column
Question: I have a table in which the key field has a value stored with a
percent sign, like '1234%'. Using this value, I want to select from another
table that can have values like '1234567', '1234678' and '1234098'. How do
I go about it?
Answer: The percent sign (%) is a wildcard in SQL Server. It can be used at the
beginning or end of a string. So the following syntax will return all of the records you
mentioned:
Col
-------
1234567
1234678
1234098
If you want to do an exact match for '1234' without the percent sign, then you'll
have to trim off the last character, like this:
SELECT * FROM TestTable WHERE Col LIKE LEFT('1234%', (LEN('1234%')-1))
10
Question: Is it possible to delete duplicate rows in a table without using a temporary
table (i.e., just do it with a single SQL statement)?
Answer: All you need to do is compare the table to itself to find out which candidates
are duplicates. Do this by assigning aliases to the table so you can use it twice, once
as A and again as B, like this:
delete
from jobs
where
job_desc in
(
select
a.job_desc
from
jobs a,
jobs b
where
a.job_desc = b.job_desc
group by
a.job_desc
having
count(a.job_desc) >1
)
When you do this you'll get a count based on the column value you think is
duplicated. I used "desc" because the IDs will be different, so the description is the
thing that is the candidate for repetition. Join the table to itself on that candidate to
find matches of it. Everything will match to itself at least once that's why you group
by the thing you think is a duplicate. Applying the HAVING clause to it squeezes out
all the "ones" or singletons, leaving only the rows that have counts that are more
By the way, this code trashes all the records that are duplicates. If you want to save
one, add a comparison for the IDs to be different in the WHERE clause.
Question: I need a query that retrieves info from an Oracle table and a query that
retrieves info from a SQL Server table. The info has to be joined together according
to Record ID numbers. I have very limited access to the Oracle database but full
control of the SQL Server database.How do I join two different queries from two
different databases?
11
Answer: To query to different data sources, you can make the Oracle server a linked
server to the SQL Server server. A linked server can be any OLE DB data source and
SQL Server currently supports the OLE DB data provider for Oracle. You can add a
An easier way to add a linked server is to use Enterprise Manager. Add the
server through the Linked Servers icon in the Security node. Once a server is linked,
you can query it using a distributed query (you have to specify the full name).
Here's an example of a distributed query (from the SQL Server Books Online) that
queries the Employees table in SQL Server and the Orders table from Oracle:
You can use the OPENROWSET( ) function to run a query on a remote SQL server by
using the following syntax:
Checking whether a table exists in a Microsoft SQL Server database is easy. You can
use this query:
SELECT 'x'
12
FROM sysobjects
WHERE type = 'U' and NAME = 'mytable'
But this query will not work while searching for global temporary tables. Global
temporary tables are stored in tempdb.
Use this syntax for the search:
ELSE
PRINT 'temp table ' + @temp_table + ' exists.'
Note: You cannot search for local temporary tables (# prefix tables) in this way. This
is because SQL Server appends a unique number to the name you supply. For
example, if you specified "#temp," the name in sysobjects would be something like
"#temp____1234."
If you have ever monitored any blocking problems in SQL Server, you know that
sp_who only shows you the spid (SQL Server's internal Process ID) that is causing
the blocking for each spid that is blocked. Often a blocked spid is shown as causing
blocking for another spid. To see the spid (or spids) that started the whole mess off,
execute the following SQL:
SELECT p.spid
,convert(char(12), d.name) db_name
, program_name
, convert(char(12), l.name) login_name
, convert(char(12), hostname) hostname
, cmd
, p.status
, p.blocked
, login_time
, last_batch
, p.spid
FROM master..sysprocesses p
JOIN master..sysdatabases d ON p.dbid = d.dbid
JOIN master..syslogins l ON p.suid = l.suid
WHERE p.blocked = 0
AND EXISTS ( SELECT 1
FROM master..sysprocesses p2
WHERE p2.blocked = p.spid )
We built this into our own version of sp_who, called sp_hywho. See the listing below.
Code for sp_hywho:
13
where id = object_id('dbo.sp_hywho')
and sysstat & 0xf = 4)
drop procedure dbo.sp_hywho
GO
SET NOCOUNT ON
IF EXISTS ( SELECT 1
FROM master..sysprocesses p
WHERE p.blocked = 0
AND EXISTS ( SELECT 1
FROM master..sysprocesses p2
WHERE p2.blocked = p.spid ) )
BEGIN
PRINT "Blocking caused by:"
PRINT ""
SELECT p.spid
,convert(char(12), d.name) db_name
, program_name
, convert(char(12), l.name) login_name
, convert(char(12), hostname) hostname
, cmd
, p.status
, p.blocked
, login_time
, last_batch
, p.spid
FROM master..sysprocesses p
JOIN master..sysdatabases d ON p.dbid = d.dbid
JOIN master..syslogins l ON p.suid = l.suid
WHERE p.blocked = 0
AND EXISTS ( SELECT 1
FROM master..sysprocesses p2
WHERE p2.blocked = p.spid )
AND (p.dbid = DB_ID( @vcDBName ) OR @vcDBName IS NULL)
ORDER BY 2,IsNull(Ltrim(program_name),"ZZZZZZZZZ"),4,5
PRINT ""
END
SELECT p.spid
,convert(char(12), d.name) db_name
, program_name
, convert(char(12), l.name) login_name
, convert(char(12), hostname) hostname
, cmd
, p.status
, p.blocked
, login_time
, last_batch
14
, p.spid
FROM master..sysprocesses p
JOIN master..sysdatabases d ON p.dbid = d.dbid
JOIN master..syslogins l ON p.suid = l.suid
WHERE (p.dbid = DB_ID( @vcDBName ) OR @vcDBName IS NULL)
ORDER BY 2,IsNull(Ltrim(program_name),"ZZZZZZZZZ"),4,5
GO
Many times, you'll want to know the last identity (key) value that was used in an
insert. The biggest reason for this is so that the same value can be reused when
inserting a foreign key. This is done differently between SQL Server and DB2. In
DB2, the identity values can be picked up by the application and reused using and
Identity_Val_function(), which returns the most recently assigned value for an
identity column. In SQL Server, the last identity value used in an insert can be
retrieved with the @@identity function.
Except Operator
To find rows in one set that do not exist in another set, use the except operator (as
defined in SQL-92 and SQL-99). For example, here's how you find column1 from
Table1 that does not exist in column2 of Table2:
Select column1 from Table1 Except Select column2 from Table2; The except operator
will remove duplicates, and a single null value will be returned in the case of multiple
null values. To return duplicates, use except all. Keep in mind, of course, that other
proprietary implementations (such as Minus in Oracle) exist.
To date, Microsoft has released six different versions of SQL Server 7.0. These
versions include the Desktop, Standard, and Enterprise Editions, as well as the
Developer, Microsoft Developer (MSDE), and Small Business Server Editions. While all
of these versions are SQL Server 7.0, there are some key differences. First of all, you
can run the Desktop Edition on Windows NT Workstation 4.0, Windows NT Server
15
4.0, and Windows 9x. The Desktop Edition doesn't support the Microsoft Search
Service, OLAP Services, parallel queries, or transaction replication--and it can't be
bought on its own. Instead, you must buy either the Standard or the Enterprise
Edition to get the Desktop Edition. You can run the Standard Edition of SQL Server
only on Windows NT Server 4.0 (or later). This version does support such features as
the Microsoft Search Service, OLAP Services, parallel queries, and transactional
replication. It also supports up to 4 CPUs and 2 GB of RAM. In contrast, the
Enterprise Edition runs only on Windows NT Server 4.0 Enterprise Edition and
supports Microsoft Cluster Server. It also supports all of the features supported in the
Standard Edition--plus up to 32 CPUs and more than 2 GB of RAM. The Developer
Edition of SQL Server is included with Visual Studio for developer use. This version
supports a limited number of connections but does include debugging tools. The
Microsoft Developer Edition (MSDE) of SQL Server is simply a run-time engine that's
included as part of Microsoft Office 2000. Although the MSDE version includes some
of the management utilities, it doesn't include all of them. The MSDE was designed
for you to distribute as part of an application, not as a stand-alone product. Finally,
the Small Business Server Edition is part of Microsoft's Small Business Server. This
version of SQL Server is essentially the same as the Standard Edition but comes
hard-coded with a limit of 100 users and a maximum database size of 10 GB.
You can set SQL Server to display information regarding the amount of disk activity
generated by T-SQL statements. This option displays the number of scans, the
number of logical reads (pages accessed), and the number of physical reads (disk
accesses) for each table referenced in the statement. This option also displays the
number of pages written for each statement. When STATISTICS IO is ON, statistical
information is displayed. When OFF, the information is not displayed. After this option
is set ON, all subsequent T-SQL statements return the statistical information until the
option is set to OFF.
Here is the syntax:
16
Derived Tables
Use 'Derived tables' wherever possible, as they perform better. Consider the
following query to find the second highest salary from the Employees table:
SELECT MIN(Salary)
FROM Employees
WHERE EmpID IN
(
SELECT TOP 2 EmpID
FROM Employees
ORDER BY Salary Desc
)
The same query can be re-written using a derived table, as shown below, and it
performs twice as fast as the above query:
SELECT MIN(Salary)
FROM
(
SELECT TOP 2 Salary
FROM Employees
ORDER BY Salary DESC
) AS A
This is just an example, and your results might differ in different scenarios depending
on the database design, indexes, volume of data, etc. So, test all the possible ways a
The biggest benefit of using derived tables over using temporary tables is that they
require fewer steps, and everything happens in memory instead of a combination of
memory and disk. The fewer the steps involved, along with less I/O, the faster the
performance.
Here are the steps when you use a temporary table:
1) Lock tempdb database
2) CREATE the temporary table (write activity)
3) SELECT data & INSERT data (read & write activity)
4) SELECT data from temporary table and permanent table(s) (read activity)
5) DROP TABLE (write activity)
4) Release the locks
Compare the above to the number of steps it takes for a derived table:
1) CREATE locks, unless isolation level of "read uncommitted" is used
2) SELECT data (read activity)
17
3) Release the locks
Using derived tables instead of temporary tables reduces disk I/O and can boost
performance. Now let's see how.
1. If you have the choice of using a join or a subquery to perform the same
task, generally the join is faster. But this is not always the case, you can may
want to test the query using both methods to determine which is faster for your
particular application. [6.5, 7.0, 2000]
4. When you have a choice of using the IN or the EXISTS clause in your
Transact-SQL, you will generally want to use the EXISTS clause, as it is more
efficient and performs faster. [6.5, 7.0, 2000] Added 8-7-2000
18
7. Where possible, avoid string concatenation, as it is not a fast process.
[6.5, 7.0, 2000] Added 8-15-2000
11. Both the MIN() and MAX() functions can take advantage of indexes on
columns. So if you perform these functions often, you might want to add an
index to the relevant columns, assuming they don't already exit. [6.5, 7.0, 2000]
Added 8-14-2000
12. Generally, avoid using optimizer hints in your queries. This is because it is
generally very hard to outguess the Query Optimizer. Optimizer hints are special
keywords that you include with your query to force how the Query Optimizer
runs. If you decide to include a hint in a query, this forces the Query Optimizer to
become static, preventing the Query Optimizer from dynamically adapting to the
current environment for the given query. More often than not, this hurts, not
helps performance.
If you think that a hint might be necessary to optimize your query, be sure you
first do all of the following first:
· Update the statistics on the relevant tables.
· If the problem query is inside a stored procedure, recompile it.
· Review the search arguments to see if they are sargable, and if not,
try to rewrite them so that they are sargable.
· Review the current indexes, and make changes if necessary.
If you have done all of the above, and the query is not running as you expect,
then you may want to consider using an appropriate optimizer hint.
If you haven't heeded my advice and have decided to use some hints, keep in
mind that as your data changes, and as the Query Optimizer changes (through
service packs and new releases of SQL Server), your hard-coded hints may no
longer offer the benefits they once did. So if you use hints, you need to
periodically review them to see if they are still performing as expected. [6.5, 7.0,
2000] Updated 3-6-2001
13. If you want to boost the performance of a query that includes an AND
operator in the WHERE clause, consider the following:
19
· Of the search criterions in the WHERE clause, at least one of them should
be based on a highly selective column that has an index.
· If at least one of the search criterions in the WHERE clause is not highly
selective, consider adding indexes to all of the columns referenced in the
WHERE clause.
[7.0, 2000] Added 9-11-2000
14. While views are often convenient to use, especially for restricting
users from seeing data they should not see, they aren't good for
performance. So if database performance is your goal, avoid using views (SQL
Server 2000 Indexed Views are another story).
Here's why. When the Query Optimizer gets a request to run a view, it runs it just
as if you had run the view's SELECT statement from the Query Analyzer. If fact, a
view runs slightly slower than the same SELECT statement run from the Query
Analyzer--but you probably would not notice the difference--because of the
additional overhead caused by the view. Unlike stored procedures, views offer no
pre-optimization.
Instead of embedding SELECT statements in a view, put them in a stored
procedure instead for optimum performance. Not only do you get the added
performance boost, you can also use the stored procedure to restrict user access
to table columns they should not see. [6.5, 7.0, 2000] Added 5-7-2001
15. Try to avoid nesting views (referring to a view from within a view). While
this is not prohibited, it makes it more difficult to identify the source of any
performance problems. A better idea is to create separate views instead of
nesting them. [6.5, 7.0, 2000] Added 10-9-2000
17. If your SELECT statement includes an IN option along with a list of values
to be tested in the query, order the list of values so that the most frequently
found values are placed at the first of the list, and the less frequently found
values are placed at the end of the list. This can speed performance because the
IN option returns true as soon as any of the values in the list produce a match.
The sooner the match is made, the faster the query completes. [6.5, 7.0, 2000]
Added 11-27-2000
18. If you need to use the SELECT INTO option, keep in mind that it can lock
system tables, preventing others users from accessing the data they need. If you
do need to use SELECT INTO, try to schedule it when your SQL Server is less
busy, and try to keep the amount of data inserted to a minimum. [6.5, 7.0,
2000] Added 11-28-2000
19. If your SELECT statement contains a HAVING clause, write your query so
that the WHERE clause does most of the work (removing undesired rows) instead
of the HAVING clause do the work of removing undesired rows. Using the WHERE
clause appropriately can eliminate unnecessary rows before they get to the
GROUP BY and HAVING clause, saving some unnecessary work, and boosting
performance.
20
For example, in a SELECT statement with WHERE, GROUP BY, and HAVING
clauses, here's what happens. First, the WHERE clause is used to select the
appropriate rows that need to be grouped. Next, the GROUP BY clause divides the
rows into sets of grouped rows, and then aggregates their values. And last, the
HAVING clause then eliminates undesired aggregated groups. If the WHERE
clause is used to eliminate as many of the undesired rows as possible, this means
the GROUP BY and the HAVING clauses will have less work to do, boosting the
overall performance of the query. [6.5, 7.0, 2000] Added 12-11-2000
20. If you need to write a SELECT statement to retrieve data from a single
table, don't SELECT the data from a view that points to multiple tables. Instead,
SELECT the data from the table directly, or from a view that only contains the
table you are interested in. If you SELECT the data from the multi-table view, the
query will experience unnecessary overhead, and performance will be hindered.
[6.5, 7.0, 2000] Added 12-11-2000
22. The GROUP BY clause can be used with or without an aggregate function. But if
you want optimum performance, don't use the GROUP BY clause without an
aggregate function. This is because you can accomplish the same end result by
using the DISTINCT option instead, and it is faster.
For example, you could write your query two different ways:
USE Northwind
SELECT OrderID
FROM [Order Details]
WHERE UnitPrice > 10
GROUP BY OrderID
or
USE Northwind
SELECT DISTINCT OrderID
FROM [Order Details]
WHERE UnitPrice > 10
Both of the above queries produce the same results, but the second one will use
less resources and perform faster. [6.5, 7.0, 2000] Added 1-12-2001
GO
USE Northwind
UPDATE Products
SET UnitPrice = ROUND(UnitPrice, 2)
21
WHERE UnitPrice > 5
GO
or
USE Northwind
UPDATE Products
SET UnitPrice = ROUND(UnitPrice * 1.06, 2)
WHERE UnitPrice > 5
GO
As is obvious from this example, the first option requires two queries to
accomplish the same task as the second query. Running one query instead of two
or more usually produces the best performance. [6.5, 7.0, 2000] Added 1-19-
2001.
24. Sometimes perception is more important that reality. For example, which
of the following two queries is the fastest:
· A query that takes 30 seconds to run, and then displays all of the
required results.
· A query that takes 60 seconds to run, but displays the first screen full of
records in less than 1 second.
Most DBAs would choose the first option as it takes less server resources and
performs faster. But from many user's point-of-view, the second one may be
more palatable. By getting immediate feedback, the user gets the impression
that the application is fast, even though in the background, it is not.
If you run into situations where perception is more important than raw
performance, consider using the FAST query hint. The FAST query hint is used
with the SELECT statement using this form:
OPTION(FAST number_of_rows)
where number_of_rows is the number of rows that are to be displayed as fast as
possible.
When this hint is added to a SELECT statement, it tells the Query Optimizer to
return the specified number of rows as fast as possible, without regard to how
long it will take to perform the overall query. Before rolling out an application
using this hint, I would suggest you test it thoroughly to see that it performs as
you expect. You may find out that the query may take about the same amount of
time whether the hint is used or not. If this the case, then don't use the hint.
[7.0, 2000] Added 3-6-2001
1. If you have the choice of using a join or a subquery to perform the same
task, generally the join is faster. But this is not always the case, you can may
want to test the query using both methods to determine which is faster for your
particular application. [6.5, 7.0, 2000]
22
2. If your application needs to insert a large binary value into an image
data column, perform this task using a stored procedure, not using an INSERT
statement embedded in your application. The reason for this is because the
application must first convert the binary value into a character string (which
doubles its size, thus increasing network traffic and taking more time) before it
can be sent to the server. And when the server receives the character string, it
then has to convert it back to the binary format (taking even more time). Using a
stored procedure avoids all this. [6.5, 7.0, 2000]
4. When you have a choice of using the IN or the EXISTS clause in your
Transact-SQL, you will generally want to use the EXISTS clause, as it is more
efficient and performs faster. [6.5, 7.0, 2000] Added 8-7-2000
23
temporary tables. While most temporary tables probably won't need, or even can
use an index, some larger temporary tables can benefit from them. A properly
designed index on a temporary table can be as great a benefit as a properly
designed index on a standard database table. [6.5, 7.0, 2000] Added 8-14-2000
11. Both the MIN() and MAX() functions can take advantage of indexes on
columns. So if you perform these functions often, you might want to add an
index to the relevant columns, assuming they don't already exit. [6.5, 7.0, 2000]
Added 8-14-2000
12. Generally, avoid using optimizer hints in your queries. This is because it is
generally very hard to outguess the Query Optimizer. Optimizer hints are special
keywords that you include with your query to force how the Query Optimizer
runs. If you decide to include a hint in a query, this forces the Query Optimizer to
become static, preventing the Query Optimizer from dynamically adapting to the
current environment for the given query. More often than not, this hurts, not
helps performance.
If you think that a hint might be necessary to optimize your query, be sure you
first do all of the following first:
· Update the statistics on the relevant tables.
· If the problem query is inside a stored procedure, recompile it.
· Review the search arguments to see if they are sargable, and if not,
try to rewrite them so that they are sargable.
· Review the current indexes, and make changes if necessary.
If you have done all of the above, and the query is not running as you expect,
then you may want to consider using an appropriate optimizer hint.
If you haven't heeded my advice and have decided to use some hints, keep in
mind that as your data changes, and as the Query Optimizer changes (through
service packs and new releases of SQL Server), your hard-coded hints may no
longer offer the benefits they once did. So if you use hints, you need to
periodically review them to see if they are still performing as expected. [6.5, 7.0,
2000] Updated 3-6-2001
13. If you want to boost the performance of a query that includes an AND
operator in the WHERE clause, consider the following:
· Of the search criterions in the WHERE clause, at least one of them should
be based on a highly selective column that has an index.
· If at least one of the search criterions in the WHERE clause is not highly
selective, consider adding indexes to all of the columns referenced in the
WHERE clause.
[7.0, 2000] Added 9-11-2000
14. While views are often convenient to use, especially for restricting
users from seeing data they should not see, they aren't good for
24
performance. So if database performance is your goal, avoid using views (SQL
Server 2000 Indexed Views are another story).
Here's why. When the Query Optimizer gets a request to run a view, it runs it just
as if you had run the view's SELECT statement from the Query Analyzer. If fact, a
view runs slightly slower than the same SELECT statement run from the Query
Analyzer--but you probably would not notice the difference--because of the
additional overhead caused by the view. Unlike stored procedures, views offer no
pre-optimization.
Instead of embedding SELECT statements in a view, put them in a stored
procedure instead for optimum performance. Not only do you get the added
performance boost, you can also use the stored procedure to restrict user access
to table columns they should not see. [6.5, 7.0, 2000] Added 5-7-2001
15. Try to avoid nesting views (referring to a view from within a view). While
this is not prohibited, it makes it more difficult to identify the source of any
performance problems. A better idea is to create separate views instead of
nesting them. [6.5, 7.0, 2000] Added 10-9-2000
17. If your SELECT statement includes an IN option along with a list of values
to be tested in the query, order the list of values so that the most frequently
found values are placed at the first of the list, and the less frequently found
values are placed at the end of the list. This can speed performance because the
IN option returns true as soon as any of the values in the list produce a match.
The sooner the match is made, the faster the query completes. [6.5, 7.0, 2000]
Added 11-27-2000
18. If you need to use the SELECT INTO option, keep in mind that it can lock
system tables, preventing others users from accessing the data they need. If you
do need to use SELECT INTO, try to schedule it when your SQL Server is less
busy, and try to keep the amount of data inserted to a minimum. [6.5, 7.0,
2000] Added 11-28-2000
19. If your SELECT statement contains a HAVING clause, write your query so
that the WHERE clause does most of the work (removing undesired rows) instead
of the HAVING clause do the work of removing undesired rows. Using the WHERE
clause appropriately can eliminate unnecessary rows before they get to the
GROUP BY and HAVING clause, saving some unnecessary work, and boosting
performance.
For example, in a SELECT statement with WHERE, GROUP BY, and HAVING
clauses, here's what happens. First, the WHERE clause is used to select the
appropriate rows that need to be grouped. Next, the GROUP BY clause divides the
rows into sets of grouped rows, and then aggregates their values. And last, the
HAVING clause then eliminates undesired aggregated groups. If the WHERE
clause is used to eliminate as many of the undesired rows as possible, this means
the GROUP BY and the HAVING clauses will have less work to do, boosting the
overall performance of the query. [6.5, 7.0, 2000] Added 12-11-2000
25
20. If you need to write a SELECT statement to retrieve data from a single
table, don't SELECT the data from a view that points to multiple tables. Instead,
SELECT the data from the table directly, or from a view that only contains the
table you are interested in. If you SELECT the data from the multi-table view, the
query will experience unnecessary overhead, and performance will be hindered.
[6.5, 7.0, 2000] Added 12-11-2000
22. The GROUP BY clause can be used with or without an aggregate function. But if
you want optimum performance, don't use the GROUP BY clause without an
aggregate function. This is because you can accomplish the same end result by
using the DISTINCT option instead, and it is faster.
For example, you could write your query two different ways:
USE Northwind
SELECT OrderID
FROM [Order Details]
WHERE UnitPrice > 10
GROUP BY OrderID
or
USE Northwind
SELECT DISTINCT OrderID
FROM [Order Details]
WHERE UnitPrice > 10
Both of the above queries produce the same results, but the second one will use
less resources and perform faster. [6.5, 7.0, 2000] Added 1-12-2001
GO
USE Northwind
UPDATE Products
SET UnitPrice = ROUND(UnitPrice, 2)
WHERE UnitPrice > 5
GO
or
USE Northwind
UPDATE Products
SET UnitPrice = ROUND(UnitPrice * 1.06, 2)
26
WHERE UnitPrice > 5
GO
As is obvious from this example, the first option requires two queries to
accomplish the same task as the second query. Running one query instead of two
or more usually produces the best performance. [6.5, 7.0, 2000] Added 1-19-
2001.
24. Sometimes perception is more important that reality. For example, which
of the following two queries is the fastest:
· A query that takes 30 seconds to run, and then displays all of the
required results.
· A query that takes 60 seconds to run, but displays the first screen full of
records in less than 1 second.
Most DBAs would choose the first option as it takes less server resources and
performs faster. But from many user's point-of-view, the second one may be
more palatable. By getting immediate feedback, the user gets the impression
that the application is fast, even though in the background, it is not.
If you run into situations where perception is more important than raw
performance, consider using the FAST query hint. The FAST query hint is used
with the SELECT statement using this form:
OPTION(FAST number_of_rows)
where number_of_rows is the number of rows that are to be displayed as fast as
possible.
When this hint is added to a SELECT statement, it tells the Query Optimizer to
return the specified number of rows as fast as possible, without regard to how
long it will take to perform the overall query. Before rolling out an application
using this hint, I would suggest you test it thoroughly to see that it performs as
you expect. You may find out that the query may take about the same amount of
time whether the hint is used or not. If this the case, then don't use the hint.
[7.0, 2000] Added 3-6-2001
1. When using the WHILE statement, don't avoid the use of BREAK just
because some people consider it bad programming form. Often when
creating Transact-SQL code using the WHILE statement, you can avoid using
BREAK by moving a few lines of code around. If this works in your case, then by
all means don't use BREAK. But if your efforts to avoid using BREAK require you
to add additional lines of code that makes your code run slower, then don't do
that. Sometimes, using BREAK can speed up the execution of your WHILE
statements. [6.5, 7.0, 2000] Added 5-18-2001
2. Computed columns in SQL Server 2000 can be indexed if they meet all
of the following criteria:
· The computed column's expression is deterministic.
· The ANSI_NULL connection-level object was on when the table was
created.
27
· TEXT, NTEXT, or IMAGE data types are not used in the computed column.
· The physical connection used to create the index, and all connections
used to INSERT, UPDATE, or DELETE rows in the table must have these six
SET options properly configured: ANSI_NULLS = ON, ANSI_PADDINGS = ON,
ANSI_WARNINGS = ON, ARITHABORT = ON, CONCAT_NULL_YIELDS_NULL =
ON, QUOTED_IDENTIFIER = ON, NUMERIC_ROUNDABORT = OFF.
[2000] Added 10-9-2000
3. One of the advantages of using SQL Server for two-tier and three-tier
applications is that you can offload much (if not most) of the data
processing work from the other tiers and place it on SQL Server. The more
work you can perform within SQL Server, the fewer the network roundtrips that
need to be made between the various tiers and SQL Server. And generally the
fewer the network roundtrips, the more scalable and faster the application
becomes.
But in some applications, such as those than involve complex math, SQL Server
has traditionally been weak. In these cases, complex math often could not be
performed within SQL Server, instead it had to be performed on another tier,
causing more network roundtrips than desired.
Now that SQL Server 2000 supports user-defined functions (UDFs), this is
becoming less of a problem. UDFs allow developers to perform many complex
math functions from within SQL Server, functions that previously could only be
performed outside of SQL Server. By taking advantage of UDFs, more work can
stay with SQL Server instead of being shuttled to another tier, reducing network
roundtrips, and potentially boosting your application's performance.
Obviously, boosting your application's performance is not as simple as moving
math functions to SQL Server, but it is one more new feature of SQL Server 2000
that developers can take advantage of in order to boost their application's
scalability and performance. [2000] Added 12-19-2000
5. SQL Server 2000 offers a new data type called "table." Its main purpose is for
the temporary storage of a set of rows. A variable, of type "table," behaves as if
it is a local variable. And like local variables, it has a limited scope, which is
within the batch, function, or stored procedure in which it was declared. In most
cases, a table variable can be used like a normal table. SELECTs, INSERTs,
UPDATEs, and DELETEs can all be made against a table variable.
For best performance, if you need a temporary table in your Transact-
SQL code, try to use a table variable instead of creating a conventional
temporary table instead. Table variables are created and manipulated in
memory instead of the tempdb database, making them much faster. In addition,
table variables found in stored procedures result in fewer compilations (than
when using temporary tables), and transactions using table variables only last as
long as the duration of an update on the table variable, requiring less locking and
logging resources. [2000] Added 8-7-2001
28
6. Don't repeatedly reuse the same function to calculate the same result
over and over within your Transact-SQL code. For example, if you need to
reuse the value of the length of a string over and over within your code, perform
the LEN function once on the string, and this assign the result to a variable, and
then use this variable, over and over, as needed in your code. Don't recalculate
the same value over and over again by reusing the LEN function each time you
need the value, as it wastes SQL Server resources and hurts performance. [6.5,
7.0, 2000] Added 8-9-2001
7. Most of you are probably familiar with the aggregate SUM() function and how
it works. Occasionally, it would be nice if SQL Server had a PRODUCT()
function, which it does not. While SUM() is used to sum a group a data, the
theoretical PRODUCT() function would find the product of a group of data.
One way around the problem of there not being a PRODUCT() function in SQL
Server is to use some combination of a cursor and/or temporary tables. As you
can imagine, this would not be very efficient. A better choice would be to use a
set-based function, like the theoretical PRODUCT() function.
With a little algebra, you can simulate a PRODUCT() function in SQL Server using
the built-in SQL Server LOG10(), POWER(), and SUM() function working together.
This is because logarithms allow you to find the product of numbers by summing
them. This was how the products of large numbers were found before the days of
calculators. (Are you old enough to remember using logarithm tables in school? I
am. Ouch!)
Below is a very simple example of how you can use a combination of the
LOG10(), POWER(), and SUM() functions in SQL Server to simulate a PRODUCT()
function. You will probably want to modify it to meet your specific needs, such as
to eliminate null data, zero data, or data that might be negative.
SELECT column_name1, POWER(10,SUM(LOG10(column_name2))) AS Product
FROM table_name
GROUP BY column_name1
For example, let's look at the following to see how this works.
Record 1 (1000, 2)
Record 2 (1000, 2)
Record 3 (1000, 2)
Record 4 (1001, 3)
Record 5 (1001, 3)
Our goal here is find the product of all the records where column_name1 = 1000
and to find the product of all the records where column_name_name1 = 1001.
When the above query is run, we get these results:
1000, 8
1001, 9
What has happened is that where column_name1 = 1000 (which are the first
three records in our sample data), the values in column_name2 (which are 2 and
2 and 2) are multiplied together to return 8. In addition, where column_name1 =
1001 (which are the last two records in our sample data), the values in
column_name2 (which are 3 and 3) are multiplied together to return 9.
Creating your own PRODUCT() function produces much faster results than trying
to accomplish the same task by using a cursor and/or temporary tables. [6.5,
7.0, 2000] Added 10-11-2001
29
will be consecutively numbered. This means they will most likely be occasional
gaps in the identity column numbering scheme. For most applications, occasional
gaps in the identity column present no problems.
On the other hand, some developers don't like these occasional gaps, trying to
avoid them. With some clever use of INSTEAD OF triggers in SQL Server 2000, it
is possible prevent these numbering gaps. But at what cost?
The problem with trying to force an identify column to number consecutively
without gaps can lead to locking and scalability problems, hurting performance.
So the recommendation is not to try to get around the identify column's built-in
method of working. If you do, expect performance problems. [2000] Added 10-
17-2001
30
PRIVATEPerformance Tuning Tips for Creating Visual Basic Applications
Using SQL Server
10. PRIVATEWhile ADO (and other VB object libraries) make database manipulation
easy for the programmer, using these shortcuts can kill SQL Server performance.
As a rule of thumb, encapsulate your DML (Data Manipulation Language) in
stored procedures and run them from your VB application. This bypasses object
library overhead (such as reducing cursors) and reduces the chatter between the
VB application and SQL Server over the network.
So what does this mean in practice? Essentially, avoid using the ADO recordset
object to modify (INSERT, UPDATE, DELETE) data in your VB code. Instead, use
Transact-SQL, encapsulated in stored procedures, to modify data in a SQL Server
database. An ADO recordset should be used as a method of reading data, not
modifying data. [6.5, 7.0, 2000] Updated 9-12-2001
11. When using an ADO recordset to return data from SQL Server, the
most efficient way is to use what is often called a firehose cursor. The
firehouse cursor is really an incorrect term because it is not a cursor. A firehose
cursor is just a method to quickly move data from SQL Server to the client that
requested it.
Essentially, a firehose cursor sends the requested data (from the query) to an
output buffer on SQL Server. Once the output buffer is full, it waits until the client
can retrieve the data from the output buffer. Then the output buffer is filled
again. This process repeats over and over until all of the data is sent to the client.
Another advantage of this method is that records are only locked long enough to
be moved to the output buffer.
When you open an ADO RecordSet and use its default settings, a firehose cursor
is automatically used by default. If you want to specify a firehouse cursor
manually, you can do so by using these property settings:
· CursorType = adForwardOnly
· CursorLocation = adUseServer
· LockType = adLockReadOnly
· CacheSize = 1
When the client receives the data from the firehose cursor, the data should be
read into a local data structure for local use by the client. [6.5, 7.0, 2000]
Updated 9-12-2001
31
· When the packets arrive at SQL Server, they must be converted back into
a form useable by SQL Server.
· SQL Server must then process the Transact-SQL statement. Assuming a
stored procedure is not used, then this code must be optimized and compiled,
then executed.
· The results, in the form of TDS (Tabular Data Stream), are then
translated into packets that can be sent over the network.
· The packets move over the network, again.
· When the packets arrive at SQL Server, they must be converted back into
TDS format.
· When ADO received the TDS data, it is converted into a recordset, ready
to be used by the application.
If you know much about the technical details of networking, then you know that
the above steps have been oversimplified. The point to remember is that round-
trips between your application and SQL Server are expensive in time and
resources, and you need to do your best in your code to minimize them. [6.5,
7.0, 2000] 7-19-01
13. One way to help reduce round-trips between your application and SQL
Server is to move the data you need at the client from SQL Server in a
single query, not in multiple queries. I have seen some applications that only
retrieve one row at a time, making a round-trip for every row needed by the
application. This can be very expensive in resources and it hurts performance. Of
course, you can't always know what rows will be needed ahead of time, but the
better you can guess, even if you guess and return too many rows, returning
them in one round-trip is usually more efficient than retrieving only one row at a
time. [6.5, 7.0, 2000] 7-19-01
14. When retrieving data from a SQL Server 7 database, take full advantage of
views when appropriate. This is especially true if you are not encapsulating your
Transact-SQL in stored procedures as recommended. While calling a view is not
usually as efficient as using a stored procedure to retrieve data, it is much more
efficient that using embedded Transact-SQL in your ASP code or COM
components. [6.5, 7.0, 2000]
15. Don't use DAO to access SQL Server, it is performance suicide. Also avoid
ODBCDirect. Instead, use RDO or ADO. [6.5, 7.0, 2000]
16. When creating a connection using ADO, be sure you use the OLE DB
provider, not the older ODBC provider for SQL Server, or the ODBC provider for
OLE DB. The parameter you will use in your connection string is
"provider=sqloledb". The OLE DB provider performs much more efficiently than
the ODBC provider, providing better performance. [7.0, 2000] 7-19-2001
18. If you are VB developer and need to access SQL Server data, but don't have
the time or interest in learning how to write stored procedures, consider using
the GetRows method of the RecordSet object. The GetRows method is used to
pull all the records from the recordset into an array, which is much faster than
32
using embedded Transact-SQL to download a RecordSet to your application. [6.5,
7.0, 2000]
20. If you have a related group, or batch, or Transact-SQL statements you want to
execute, but you don't want to use a stored procedure, as generally
recommended for dealing with batches of Transact-SQL statements, one option
you can use to boost performance if your VB code is to concatenate two or
more separate Transact-SQL statements into a single batch and execute
them as a single message. This is much more efficient that sending the
Transact-SQL code to SQL Server as many different messages. [6.5, 7.0, 2000]
Added 7-19-01
2. If your application allows users to run queries, but you are unable in your
application to easily prevent users from returning hundreds, even thousands of
unnecessary rows of data they don't need, consider using the TOP operator
within the query. This way, you can limit how may rows are returned, even if the
user doesn't enter any criteria to help reduce the number or rows returned to the
client. [6.5, 7.0, 2000]
3. If your application needs to perform looping, try to put the loop inside a
stored procedure so it can be executed on the server without having to make
round trips between the client and server. [6.5, 7.0, 2000]
6. If you have the need to filter or sort data on-the-fly at the client, let
ADO do this for you at the client. When the data is first requested by the
client from the server (ideally using a stored procedure), have all the data the
client wants to "play" with sent to the client. Once the recordset is at the client,
then ADO methods can be used to filter or sort the data. This helps to reduce
network traffic and takes some of the load off of the server. [6.5, 7.0, 2000]
33
needs to be set to a much higher figure, such as between 100 and 500,
depending on the number of rows that are to be eventually returned from the
server to the client. [6.5, 7.0, 2000]
8. When calling SQL Server stored procedures from the ADO Command
object, don't use the Refresh method to identify the parameters of a stored
procedure. This produces extra network traffic and slows performance. Instead,
explicitly create the parameters yourself using ADO code. [7.0, 2000]
9. ADO allows you to create four different types of SQL Server cursors. Each has
its own place, and you will want to choose the cursor that uses the least
possible resources for the task at hand. When at all possible, attempt to use
the Forward-Only cursor, which uses the least amount of overhead of the four
cursor types. [6.5, 7.0, 2000]
10. Avoid using the MoveFirst method of the RecordSet object when using
a Forward-Only cursor. In effect, when you use this method, it re-executes the
entire query and repopulates the Forward-Only cursor, increasing server
overhead. [6.5, 7.0, 2000] Added 9-12-2001
11. If you create COM objects to encapsulate database access, try to follow these
two suggestions if you want optimum speed: 1) use in-process dlls; and 2) use
early-binding. [6.5, 7.0, 2000]
13. When storing your SQL Server data into VB variables, always use
strongly typed variables. Avoid using the variant data type (which is not always
possible), as it has greater overhead than the other data types. [6.5, 7.0, 2000]
14. If you create object variables in your VB code to refer to COM objects
that hold SQL Server data, be sure to strongly type them. Avoid using the AS
OBJECT keywords, instead, always explicitly specify the type of object you want
to create. [6.5, 7.0, 2000]
15. When instantiating COM objects to hold SQL Server data, create them
explicitly, not implicitly. [6.5, 7.0, 2000]
16. If you will be calling the same stored procedure, view, or SQL
statements over and over again in your code, don't create a new Command
object each time. Instead, reuse the Command object. [6.5, 7.0, 2000]
17. When looping through recordsets, be sure you bind columns to field
objects before the looping begins. Don't use the Fields collection of the Recordset
object to assign values for fields in a Recordset within each loop, it incurs much
more overhead. [6.5, 7.0, 2000]
18. If you know that the results of a query from within a stored procedure
you call will return only one row of data (and not an entire recordset), don't
open an ADO Recordset for the purpose of retrieving the data. Instead, use a
stored procedure output parameter. [6.5, 7.0, 2000]
34
19. If your application needs to insert a large binary value into an image data
column, perform this task using a stored procedure, not using an INSERT
statement embedded in your application. The reason for this is because the
application must first convert the binary value into a character string (which
doubles its size, thus increasing network traffic and taking more time) before it
can be sent to the server. And when the server receives the character string, it
then has to convert it back to the binary format (taking even more time). Using a
stored procedure avoids all this. [6.5, 7.0, 2000]
20. When ADO is used to open more than one ForwardOnly recordset on a
single Connection object at a time, only the first recordset is opened using the
Connection object you previously created. Additional new connections don't use
the same Connection object. Instead, separate connections are created for each
ForwardOnly recordset you create after the first. This occurs because SQL Server
can only open one ForwardOnly cursor per connection. The more connections you
create, the greater the stress on SQL Server and performance and scalability
suffer.
To avoid this problem, don't use a ForwardOnly recordset. Static, Keyset, and
Dynamic recordsets don't have this problem. Another option is to use a client side
cursor instead of SQL Server cursor. Or you can close each recordset before
opening another on the same connection. [6.5, 7.0, 2000]
23. When using recordsets, be sure to open them explicitly, not implicitly.
When recordsets are opened implicitly, you cannot control the default cursor and
lock types, which are, respectively, forward-only and read-only. If you always
open your recordsets explicitly, then you can specify which cursor and lock types
you want to invoke for this particular situation, specifying the types with the least
amount of overhead to accomplish the task at hand. [6.5, 7.0, 2000] Added 12-
14-2000
24. When using ADO to make connections to SQL Server, always be sure
you explicitly close any Connection, Recordset, or Command objects you
have opened. While letting an object go out of scope will in affect close the
object, it is not the same as explicitly closing an object. By explicitly closing these
objects and setting them to nothing, you do two things. First, you remove the
object sooner than later, helping to free up resources. Second, you eliminate the
possibility of "connection creep". Connection creep occurs when connection or
resource pooling is used and when connections are not properly closed and
35
released from the pool. This helps to defeat the purpose of pooling and reduces
SQL Server's performance. [6.5, 7.0, 2000]
25. If you are connecting to SQL Server via either OLE DB (version 2.0 or higher)
or ODBC (version 3.0 or higher), SQL Server connection pooling is
automatically implemented for you. Because of this, you don't have to write
special code to implement connection pooling yourself. In addition, you don't
want to even reuse an ADO connection object, which is commonly done by many
VB developers.
If you want to take the best advantage of database connection pooling, and
optimize your VB application's SQL Server data access, the best advice you can
receive is to be sure that you only open a database connection just before you
need it, and then close it immediately after you are done with it. Don't leave
database connections open if you are not using them.
When you create or tear down a database connection in your VB code, you aren't
really creating a new connection or tearing down a current connection. What is
happening is that your connection requests are send to OLE DB or ODBC, and
they determine if a connection needs to be created or torn down. If a new
connection is needed, then one is created, or one is used from the current
connection pool. And if you request that a connection be torn down, it will
actually pool the unused connection until it is needed, or tear it down if it is not
reused within a given time period. [6.5, 7.0, 2000] Updated 8-28-2001
26. In order for connection pooling to work correctly, be sure each connection
you open uses the same ConnectionString parameters. Connection pooling
only works if all of the parameters for the ConnectionString are identical. If they
are all not identical, then a new connection will be opened, circumventing
connection pooling. [6.5, 7.0, 2000] Added 2-5-2001
27. If appropriate for your application, locate the application's data access
components on the SQL Server where the data is, instead of at the client.
This can significantly reduce network traffic and overhead and boost data
throughput. [6.5, 7.0, 2000]
30. If you need your VB application to generate a unique value for use in a
primary key column in a SQL Server table, performance will be slightly better if
you let SQL Server, instead of your VB application, create the unique value. SQL
Server can generate unique keys using either an Identity (using the Integer data
type) column or by using the NEWID function in a UniqueIdentifier column. Of
these two, Identify columns offer better performance. [6.5, 7.0, 2000]
36
31. When creating COM components to access SQL Server, try to design
the component to have as few properties as possible. For example, instead
of having a property for every column of data you want to send back or forth
between the database and your application, create one generic property that can
be used to send all of the columns at one time. What this does is reduce the
number of calls that must be made by the component, reducing overhead on the
component and SQL Server. [6.5, 7.0, 2000]
32. When setting Connection Object properties, use the following dot notation
instead of using the fully qualified object property notation, whenever
appropriate, as it is faster.
With cn
.ConnectionTimeout = 100
.ConnectionString = "xyz"
.CursorLocation = adUseClient
End With
cn.ConnectionTimeout = 100
cn.ConnectionString = "xyz"
cn.CursorLocation = adUseClient
For i = 1 To 100000
Fast = Fast - GetTickCount
With cn
.ConnectionTimeout = 100
.ConnectionString = "xyz"
.CursorLocation = adUseClient
End With
Fast = Fast + GetTickCount
37
Slow = Slow - GetTickCount
cn.ConnectionTimeout = 100
cn.ConnectionString = "xyz"
cn.CursorLocation = adUseClient
Slow = Slow + GetTickCount
Next
MsgBox "Fast=" & Fast & vbCrLf & "Slow=" & Slow
End Sub
1. Don't use VB objects to act as a data holder (to store data) in your
SQL Server-based applications. Instead, use an array or a collection of user-
defined types (UDTs). While using objects you create to store data can be
convenient, it also creates a lot of unnecessary overhead. Each time you have to
instantiate and then destroy an object hurts performance and scalability. How do
you know if an object you have created is storing data that should be stored
elsewhere? If the class has mostly properties and few if any methods, then this is
a good clue. [6.5, 7.0, 2000] Added 10-26-2000
4. If you decide not to use a stored procedure to access SQL Server, but
instead choose to use an embedded SQL statement in your VB code, and if that
embedded SQL statement will be repeated, such as in a loop, consider setting the
ADO Command object's "Prepared" property to "True".
What this property does is to tell SQL Server to compile and save a copy of your
SQL statement in SQL Server's cache. The first time the SQL statement is
executed, the statement has to be compiled and stored in memory. But in
subsequent calls, the statement is called from the cache, boosting performance
because it does not have to be recompiled each time it is called.
If the SQL statement will only be executed once, then don't set this option, as it
will actually decrease performance for SQL statements that are run only once.
The performance boost only comes if the SQL statement is run multiple times.
[7.0, 2000] Added 12-27-2000
38
And the biggest difference is that they are much faster, about twice as fast. If you
haven't learned about dictionaries yet, you need to take the time now to learn
about their numerous advantages. [6.5, 7.0, 2000] Added 12-27-2000
7. If you still have any legacy VB applications that still use VB-SQL to
access SQL Server, you may want to consider rewriting the app. VB-SQL not only
provides slow access, it is no longer supported by Microsoft. [6.5] Added 1-2-
2001
8. If you are the sort of VB developer who likes to design their applications
around objects, you want to keep in mind that over-encapsulating data
access within objects can hurt performance. For example, from an OO
design approach, you might consider encapsulating data access to each individual
table in a SQL Server database, creating a separate class for each table. While
this may appeal to your OO design goals, it is inefficient from a performance
perspective.
Too much encapsulation can lead to situations where you don't take advantage of
SQL Server's built-in optimization abilities, it causes too many round-trips to the
database, and it can use more database connections than absolutely required.
Instead of over-encapsulating your data access in class, a more efficient
approach is to use stored procedures to encapsulate your business logic. Stored
procedures eliminate these three drawbacks. [6.5, 7.0, 2000] Added 7-19-2001
39
sp_executesql, then you need to review your ADO code, looking for ways to
optimize it. [7.0, 2000] Added 8-31-2001
10. Limit the amount of rows you return from a database to populate a
pick-list or drop-down box. Lots of rows not only slows down your application,
it also makes it less convenient for your user to select the item or items they
need. Have you ever had to select from over 100 choices? It is not easy.
If you need to give your user a lot of choices, instead of displaying them in one
large pick-list or drop-down list, provide a way for the user to filter out any
options that are not applicable to them. For the best performance, perform the
filtering at the client, not the SQL Server.
Ideally, you should use a stored procedure to retrieve the minimum amount of
rows you need, then if there are still a lot of rows to to deal with (from the user's
perspective), provide a mechanism for the user to filter the list using the various
ADO methods available to use for local filtering. This reduces the number of
round trips from the client to SQL Server, helping to boost performance. [6.5,
7.0, 2000] Added 12-11-2001
40
PRIVATEMiscellaneous SQL Server Performance Tuning Tips
1. PRIVATEIf you need to delete all the rows in a table, don't use DELETE to delete
them all, as the DELETE statement is a logged operation and can take time. To perform
the same task much faster, use the TRUNCATE TABLE instead, which is not a logged
operation. Besides deleting all of the records in a table, this command will also reset the
seed of any IDENTITY column back to its original value. [6.5, 7.0, 2000] Updated 7-3-
2001
3. Use sp_who and sp_who2 (sp_who2 is not documented in the SQL Server
Books Online, but offers more details than sp_who) to provide locking and
performance-related information about current connections to SQL Server. [6.5,
7.0, 2000]
5. For a quick and dirty way to check to see if your SQL Server has
maxed out its memory (and causing your server to page), try this. Bring up
the Task Manager and go to the "Performance" tab.
Here, check out two numbers: the "Total" under "Commit Charge (k)" and the
"Total" under "Physical Memory (k)". If the "Total" under "Commit Charge (k)" is
greater than the "Total" under "Physical Memory (k)", then your server does not
have enough physical memory to run efficiently as it is currently configured and
is most likely causing your server to page unnecessarily. Excess paging will slow
down your server's performance.
If you notice this problem, you will probably want to use the
Performance Monitor to further investigate the cause of this problem. You will
also want to check to see how much physical memory has been allocated to SQL
Server. Most likely, this setting has been set incorrectly, and SQL Server has been
set to use too much physical memory. Ideally, SQL Server should be set to
allocate physical RAM dynamically. [6.5, 7.0, 2000]
6. Internet Information Server (IIS) has the ability to send its log files
directly to SQL Server for storage. Busy IIS servers can actually get bogged
down trying to write log information directly to SQL Server, and because of this, it
is generally not recommend to write a web logging information to SQL Server.
Instead, logs should be written to text files, and later imported into SQL Server
using BCP or DTS. [6.5, 7.0, 2000]
41
7. SQL Server 7 has a database compatibility mode that allows applications
written for previous versions of SQL Server to run under SQL Server 7. In you
want maximum performance for your database, you don't want to run your
database in compatibility mode. Instead, it should be running in native SQL
Server 7 mode. Of course, this may require you to modify your application to
make it SQL Server 7 compliant, but in most cases, the additional work required
to update your application will be more than paid for with improved performance.
[7.0, 2000]
8. When experimenting with the tuning of your SQL Server, you may want
to run the DBCC DROPCLEANBUFFERS command to remove all the test data from
SQL Server's data cache (buffer) between tests to ensure fair testing. If you want
to clear out the stored procedure cache, use this command, DBCC
FREEPROCCACHE. Both of these commands are for testing purposes and should
not be run on a production SQL Server. [7.0, 2000]
42
For example, one of the easiest ways to speed up our Transact-SQL coding, in
addition to maintaining and troubleshooting our code once it is written, it to
format it in an easy to read format.
While there are many different code formatting guidelines available, here are
some basic ones you should consider following, if you aren't doing so already:
· Begin each line of your Transact-SQL code with a SQL verb, and capitalize
all Transact-SQL statements and clauses, such as:
SELECT customer_number, customer_name
FROM customer
WHERE customer_number > 1000
ORDER BY customer_number
· If a line of Transact-SQL code is too long to fit onto one line, indent the
following line(s), such as:
SELECT customer_number, customer_name, customer_address,
customer_state, customer_zip, customer_phonenumber
· Separate logical groupings of Transact-SQL code by using appropriate
comments and documentation explaining what each grouping goes.
These are just a few of the many possible guidelines you can follow when writing
your Transact-SQL code to make it more readable by you and others. You just
need to decide on some standard, and then always follow it in your coding. If you
do this, you will definitely boost your coding performance. [6.5, 7.0, 2000] Added
12-5-2000
14. SQL Server 2000 offers support of SSL encryption between clients and
the server. While selecting this option prevents the data from being viewed, it
also adds additional overhead and reduces performance. Only use SSL
encryption if absolutely required. If you need to use SSL encryption, consider
purchasing a SSL encryption processor for the server to speed performance.
[2000] Added 9-21-2000
43
same server; or to run SQL Server 6.5, SQL Server 7.0, and SQL Server 2000 on
the same server; and to run up to 16 concurrent instances of SQL Server 2000
on the same server.
As you might imagine, each running instance of SQL Server takes up server
resources. Although some resources are shared by multiple running instances,
such as MSDTC and the Microsoft Search services, most are not. Because of this,
each additional instance of SQL Server running on the same server have to fight
for available resources, hurting performance. For best performance, run only a
single instance (usually the default) on a single physical server. The main reasons
for using named instances is for upgrading older versions of SQL Server to SQL
Server 2000, transition periods where you need to test your applications on
multiple versions of SQL Server, and for use on developer's workstations. [2000]
Added 11-14-2000
2. If you run the ALTER TABLE DROP COLUMN statement to drop a variable
length or text column, did you know that SQL Server will not automatically
reclaim this space after performing this action. To reclaim this space, which will
help to reduce unnecessary I/O due to the wasted space, you can run the
following command, which is new to SQL Server 2000.
DBCC CLEANTABLE (database_name, table_name)
Before running this command, you will want to read about it in Books Online to
learn about some of its options that may be important to you. [2000] Added 2-28-
2001
3. Trace flags, which are used to enable and disable some special
database functions temporarily, can often chew up CPU utilization and
other resources on your SQL Server unnecessarily. If you just use them for
a short time to help diagnose a problem, for example, and then turn them off as
soon as your are done using them, then the performance hit you experience is
small and temporary.
What happens sometimes is that you, or another DBA, turns on a trace flag, but
forgets to turn it off. This of course, can negatively affect your SQL Server's
performance. If you want to check to see if there are any trace flags turned on on
a SQL Server, run this command in Query Analyzer:
DBCC TRACESTATUS(-1)
If there are any trace flags on, you will see them listed on the screen after
running this command. DBCC TRACESTATUS only finds traces created at the
client (connection) level. If a trace has been turned on for an entire server, this
will not show up.
If you find any, you can turn them off using this command:
DBCC TRACEOFF(number of trace)
[7.0, 2000] Added 6-6-2001
4. SQL Server offers a feature called the black box. When enabled, the black
box creates a trace file of the last 128K worth of queries and exception errors.
This can be a great tool for troubleshooting some SQL Server problems, such as
crashes.
Unfortunately, this feature uses up SQL Server resources to maintain the trace
file than can negatively affect its performance. Generally, you will want to only
turn the black box on when troubleshooting, and turn it off during normal
production. This way, your SQL Server will be minimally affected. [7.0, 2000]
Added 6-6-2001
44
5. If you have ever performed a SELECT COUNT(*) on a very large table,
you know how long it can take. For example, when I ran the following
command on a large table I manage:
SELECT COUNT(*) from <table_name>
It took 1:09 to count 10,725,948 rows in the table. At the same time, SQL Server
had to perform a lot of logical and physical I/O in order to perform the count,
chewing up important SQL Server resources.
A much faster, and more efficient, way of counting rows in a table is to run the
following query:
SELECT rows
FROM sysindexes
When I run the query against the same table, it takes less than a second to run,
and it gave me the same results. Not a bad improvement, and it took virtually no
I/O. This is because the row count of your tables is stored in the sysindexes
system table of your database. So instead of counting rows when you need to,
just look up the row count in the sysindexes table.
The is one potential downside to using the sysindexes table. And that this system
table is not updated in real time, so it might underestimate the number of rows
you actually have. Assuming you have the database option turned on to "Auto
Create Statistics" and "Auto Update Statistics", the value you get should be very
close to being correct, if not correct. If you can live with a very close estimate,
then this is the best way to count rows in your tables. [7.0, 2000] Added 7-3-
2001
6. Looking for some new tools to help performance tune your operating
system? Then check out the performance tools at Sysinternals. For example,
they have tools to defrag your server's swap file, among many others. And best
of all, most are free. [6.5, 7.0, 2000] Added 7-3-2001
45
· SQL Server dll version information
· The output from these system stored procedures:
· sp_configure
· sp_who
· sp_lock
· sp_helpdb
· xp_msver
· sp_helpextendedproc
· sysprocesses
· Input buffer SPIDs/deadlock information
· Microsoft diagnostics report for the server
· The last 100 queries and exceptions (if the query history trace was
running)
[ 7.0, 2000] Added 10-12-2001
GO
GO
GO
GO
The "max degree of parallelism" option is an advanced option, so the first portion
of the code above is used to turn on the "show advanced option." Once that is
done, then you can set the "max degree of parallelism" option. By setting this
option to "0", you are telling SQL Server to use all available CPUs in the server.
See this Microsoft article for more information:
46
http://support.microsoft.com/support/kb/articles/Q273/8/80.ASP [7.0] Added
10-23-2001
10. Memory leaks can steal valuable memory from your SQL Server,
reducing performance, and perhaps even forcing you to reboot your server. A
memory leak occurs when a poorly-written or buggy program requests memory
from the operating system, but does not release the memory when it is done with
it. Because of this, the application can use up more and more memory in a
server, greatly slowing it down, and even perhaps crashing the server.
Some memory leaks come from the operating system itself, device drivers,
MDAC components, and even SQL Server. And of course, virtually any application
can cause a memory leak, which is another good reason to dedicate a single
server to SQL Server instead of sharing it among multiple applications.
Memory leaks are often hard to identify, especially if they leak memory
slowly. Generally, memory leaks become apparent when you notice that your
server is running out of available memory and paging becomes a big problem. A
symptom of this is a SQL Server that runs quickly after being rebooted, but
begins to run more and more slowly as time passes, and when the system is
rebooted again, it speeds up again.
One way to help get rid of many memory leaks is to ensure that you always
have the latest service packs or updates for your server's software. But a
memory leak you find may not have an immediate fix. If this is the case, you
may be forced to reboot your server periodically in order to free up memory.
Identifying what is causing a memory leak is often difficult. One method involved
using Performance Monitor to monitor all of the counters in the Memory object
over time, seeing what is happening internally in your computer. Another method
is to use Task Manager to view how much memory is used by each process. A
process that seems to be using an unusual amount of memory may be the
culprit. [6.5, 7.0, 2000] Added 12-11-2001
Keep in mind the word "considered". An index created to support the speed of a
particular query may not be the best index for another query on the same table.
Sometimes you have to balance indexes to attain acceptable performance on all
the various queries that are run against a table. [6.5, 7.0, 2000] Updated 12-7-
2001
47
3. Don't over index your OLTP tables, as every index you add increases the
time in takes to perform INSERTS, UPDATES, and DELETES. There must be a fine
line drawn between having the ideal number of indexes (for SELECTs) and the
ideal number for data modifications. [6.5, 7.0, 2000]
4. Don't automatically add indexes on a table because it seems like the right
thing to do. Only add indexes if you know that they will be used by the queries
run against the table. [6.5, 7.0, 2000]
5. Don't accidentally add the same index twice on a table. This is easier to
happen than you think. For example, you add a unique or primary key to an
column, which of course creates an index to enforce what you want to happen.
But without thinking about it when evaluating the need for indexes on a table,
you decide to add a new index, and this new index happens to be on the same
column as the unique or primary key. As long as you give indexes different
names, SQL Server will allow you to create the same index over and over. [7.0,
2000] Added 4-2-2001
6. If you have a table that is subject to many INSERTS, be sure that you
have added a clustered index to it (not based on an incrementing key), whether
or not it really needs one. A table that does not have a clustered index is called a
heap. Every time data is INSERTed into a heap, the row is added to the end of
the table. If there are many INSERTS, this spot could become a "hot spot" which
could significantly affect performance. By adding a clustered index to the table
(not based on an incrementing key), any potential hotspots are avoided. [6.5,
7.0, 2000] Added 2-22-2001
7. Drop indexes that are never used by the Query Optimizer. Unused
indexes slow data modifications and waste space in your database, increasing the
amount of time it takes to backup and restore databases. Use the Index Wizard
( 7.0 and 2000) to help identify indexes that are not being used. [6.5,
8. Generally, you probably won't want to add an index to a table under
these conditions:
· If the index is not used by the query optimizer. Use Query Analyzer's
"Show Execution Plan" option to see if your queries against a particular table
use an index or not. If the table is small, most likely indexes will not be used.
· If the column values exhibit low selectivity, often less than 90%-95% for
non-clustered indexes.
· If the column(s) to be indexed are very wide.
· If the column(s) are defined as TEXT, NTEXT, or IMAGE data types.
· If the table is rarely queried.
9. While high index selectivity is generally an important factor that the Query
Optimizer uses to determine whether or not to use an index, there is one special
case where indexes with low selectivity can be useful speeding up SQL Server.
This is the case for indexes on foreign keys. Whether an index on a foreign
key has either high or low selectivity, an index on a foreign key can be
used by the Query Optimizer to perform a merge join on the tables in
question. A merge join occurs when a row from each table is taken and then
they are compared to see if they match the specified join criteria. If the tables
being joined have the appropriate indexes (no matter the selectivity), a merge
join can be performed, which is generally much faster than a join to a table with
a foreign key that does not have an index. [7.0, 2000] Added 4-9-2001
48
10. On data warehousing databases, which are essentially read-only, having an
many indexes as necessary for covering virtually any query is not normally a
problem. [6.5, 7.0, 2000]
11. To provide the up-to-date statistics the query optimizer needs to make
smart query optimization decisions, you will generally want to leave the
"Auto Update Statistics" database option on. This helps to ensure that the
optimizer statistics are valid, helping to ensure that queries are properly
optimized when they are run.
But this option is not a panacea. When a SQL Server database is under very
heavy load, sometimes the auto update statistics feature can update the statistics
at inappropriate times, such as the busiest time of the day.
If you find that the auto update statistics feature is running at inappropriate
times, you may want to turn it off, and then manually update the statistics (using
UPDATE STATISTICS or sp_updatestats) when the database is under a less heavy
load.
But again, consider what will happen if you do turn off the auto update
statistics feature? While turning this feature off may reduce some stress on your
server by not running at inappropriate times of the day, it could also cause some
of your queries not to be properly optimized, which could also put extra stress on
your server during busy times.
Like many optimization issues, you will probably need to experiment to see if
turning this option on or off is more effective for your environment. But as a rule
of thumb, if your server is not maxed out, then leaving this option is probably the
best decision. [7.0, 2000] More info from Microsoft
12. Keep the "width" of your indexes as narrow as possible, especially when
creating composite (multi-column) indexes. This reduces the size of the index
and reduces the number of reads required to read the index, boosting
performance. [6.5, 7.0, 2000]
13. If possible, try to create indexes on columns that have integer values instead
of characters. Integer values have less overhead than character values. [6.5, 7.0,
2000]
2. Even if the WHERE clause in a query does not specify the first column of an
available index (which normally disqualifies the index from being used), if the
index is a composite index and contains all of the columns referenced in
the query, the query optimizer can still use the index, because the index is a
covering index. [6.5, 7.0, 2000]
3. When you create an index with a composite key, the order of the columns
in the key is important. Try to order the columns in the key as to enhance
selectivity, with the most selective columns to the leftmost of the key. If you
don't due this, and put a non-selective column at the first part of the key, you
risk having the Query Optimizer not use the index at all. Generally, a column
49
should be at least 95% unique in order for it to be considered selective. [6.5, 7.0,
2000] Updated 3-6-2001
5. If you have two or more tables that are frequently joined together,
then the columns used for the joins should have an appropriate index. If the
columns used for the joins are not naturally compact, then considering adding
surrogate keys to the tables that are compact in order to reduce the size of the
keys, thus decreasing I/O during the join process, increasing overall
performance. [6.5, 7.0, 2000]
8. If you like to get under the cover of SQL Server to learn more about indexing,
take a look at the sysindex system table that is found in every database. Here,
you can find a wealth of information on the indexes and tables in your database.
To view the data in this table, run this query from the database you are
interested in: SELECT *
FROM sysindexes. Here are some of the more interesting fields found in this
table:
· dpages: If the indid value is 0 or 1, then dpages is the count of the data
pages used for the index. If the indid is 255, then dpages equals zero. In all
other cases, dpages is the count of the non-clustered index pages used in the
index.
· id: Refers to the id of the table this index belongs to.
· indid: This column indicates the type of index. For example, 1 is for a
clustered table, a value greater than 1 is for a non-clustered index, and a 255
indicates that the table has text or image data.
· OrigFillFactor: This is the original fillfactor used when the index was first
created, but it is not maintained over time.
50
· statversion: Tracks the number of times that statistics have been
updated.
· status: 2 = unique index, 16 = clustered index, 64 = index allows
duplicate rows, 2048 = the index is used to enforce the Primary Key
constraint, 4096 = the index is used to enforce the Unique constraint. These
values are additive, and the value you see in this column may be a sum of
two or more of these options.
· used: If the indid value is 0 or 1, then used is the number of total pages
used for all index and table data. If indid is 255, used is the number of pages
for text or image data. In all other cases, used is the number of pages in the
index.
9. Don't use FLOAT or REAL data types for primary keys, as they add
unnecessary overhead and can hurt performance. [6.5, 7.0, 2000] Added 10-4-
2000
10. If your WHERE clause includes an AND operator, one way to optimize it is
to ensure that at least one of the search criterion is highly selective and includes
an index for the relevant column. [6.5, 7.0, 2000] Added 10-17-2000
11. The Query Optimizer will always perform a table scan or a clustered
index scan on a table if the WHERE clause in the query contains an OR operator
and if any of the referenced columns in the OR clause are not indexed (or does
not have a useful index). Because of this, if you use many queries with OR
clauses, you will want to ensure that each referenced column has an index. [7.0,
2000] Added 10-17-2000
13. If you use the SOUNDEX function against a table column in a WHERE
clause, the Query Optimizer will ignore any available indexes and
perform a table scan. If your table is large, this can present a major
performance problem. If you need to perform SOUNDEX type searches, one way
around this problem is to precalculate the SOUNDEX code for the column you are
searching and then place this value in a column of its own, and then place an
index on this column in order to speed searches. [6.5, 7.0, 2000] Added 11-9-
2000
14. If you need to create indexes on large tables, you may be able to speed
up their creation by using the new SORT_IN_TEMPDB option available with the
CREATE INDEX command. This option tells SQL Server to use the tempdb
database, instead of the current database, to sort data while creating indexes.
Assuming your tempdb database is isolated on its own separate disk or disk
array, then the process of creating the index can be sped up. The only slight
downside to using this option is that it takes up slightly more disk space than if
you didn't use it, but this shouldn't be much of an issue in most cases. If your
tempdb database is not on its own disk or disk array, then don't use this option,
as it can actually slow performance. [2000] Added 10-19-2000
51
15. SQL Server 2000 Enterprise Edition (not the standard edition) offers the
ability to create indexes in parallel, greatly speeding index creation.
Assuming your server has multiple CPUs, SQL Server 2000 uses near-linear
scaling to boost index creation speed. For example, using two CPUs instead of
one CPU almost halves the speed it takes to create indexes. [2000] Added 12-
19-2000
16. As you probably already know, indexes on narrow columns are preferable to
indexes on wide columns. The narrower the index, the more entries SQL Server
can fit on a data page, which in turn reduces the amount of I/O required to
access the data. But sometimes the column you want to search on using an index
is much wider than desirable.
For example, let's say you have a music database that lists the titles of over
5,000,000 songs, and that you want to search by song title. Also assume that the
column used to store the music titles is a VARCHAR(45). Forty-five characters is a
very wide index, and creating an index on such a wide column is not wise from a
performance perspective. So how do we deal with such a scenario?
SQL Server 2000 offers a new function called CHECKSUM. The main
purpose for this function is to create what are called hash indices. A hash indices
is an index built on a column that stores the checksum of the data found in
another column in the table. The CHECKSUM function takes data from another
column and creates a checksum value. In other words, the CHECKSUM function is
used to create a mostly unique value that represents other data in your table. In
most cases, the CHECKSUM value will be much smaller than the actual value. For
the most part, checksum values are unique, but this is not guaranteed. It is
possible that two slightly different values may produce the same identical
CHECKSUM value.
Here's how this works using our music database example. Say we have a
song with the title "My Best Friend is a Mule from Missouri". As you can see, this
is a rather long value, and adding an index to the song title column would make
for a very wide index. But in this same table, we can add a CHECKSUM column
that takes the title of the song and creates a checksum based on it. In this case,
the checksum would be 1866876339. The CHECKSUM function always works the
same, so if you perform the CHECKSUM function on the same value many
different times, you would always get the same result.
So how does the CHECKSUM help us? The advantage of the CHECKSUM function
is that instead of creating a wide index by using the song title column, we create
an index on the CHECKSUM column instead. "That's fine and dandy, but I thought
you wanted to search by the song's title? How can anybody ever hope to
remember a checksum value in order to perform a search?"
Here's how. Take a moment to review this code:
FROM songs
52
In this example, it appears that we are asking the same question twice, and
in a sense, we are. The reason we have to do this is because there may be
checksum values that are identical, even though the names of the songs are
different. Remember, unique checksum values are not guaranteed.
Here's how the query works. When the Query Optimizer examines the WHERE
clause, it determines that there is an index on the checksum_title column. And
because the checksum_title column is highly selective (minimal duplicate values)
the Query Optimizer decides to use the index. In addition, the Query Optimizer is
able to perform the CHECKSUM function, converting the song's title into a
checksum value and using it to locate the matching records in the index. Because
an index is used, SQL Server can very quickly locate the rows that match the
second part of the WHERE clause. Once the rows have been narrowed down by
the index, then all that has to be done is to compare these matching rows to the
first part of the WHERE clause, which will take very little time.
This may seem a lot of work to shorten the width of an index, but in many
cases, this extra work will pay off in better performance in the long run. Because
of the nature of this tip, I suggest you experiment using this method, and the
more conventional method of creating an index on the title column itself. Since
there are so many variables to consider, it is tough to know which method is
better in your particular situation unless you give them both a try. [2000] Added
3-6-2001
17. There is a bug in SQL Server 7.0 and 2000 that has yet to be corrected,
that can negatively affect the performance of some queries. Queries that
have multiple OR clauses in them that are based on a clustered composite index
may ignore the index and perform a table scan instead. This bug only appears if
the query is within a stored procedure, or if it is executed through an ODBC-
based application, such as VB, ASP, or Microsoft Access.
The best way to identify if you are experiencing this bug is to view the Query Plan
for slow queries that fit the criteria above, and see if a table scan is being
performed, or if the index is being used.
There are five different possible workarounds for this bug, depending on your
circumstances:
· Use an appropriate index hint to force the use of the composite index.
This is probably the easiest method to get around this problem, and the one I
recommend.
· Changing from a clustered composite index to a non-clustered composite
index may help, but this is not guaranteed. You will have to test it for
yourself.
· Rewrite the query using a UNION clause to combine the results returned
from the OR clauses. Of course, using a UNION clause may itself degrade
performance. You will have to test this option to see if it is faster or not.
· If the query is being executed from an ODBC application through the
SQLPrepare function with the SQL Server ODBC driver version 3.6 or earlier,
then you can disable the "Generate Stored Procedures for Prepared
Statements" option to workaround the bug.
· If the query is being executed from an ODBC application through either the
SQLPrepare or SQLExecDirect functions with a parameterized query using the SQL
Server ODBC driver version 3.7, you can use the odbccmpt utility to enable the SQL
Server 6.5 ODBC compatibility option and also disable the "Generate Stored
Procedures for Prepared Statements" option to workaround the bug.
53
SQL Server Performance Tuning Tips for Stored Procedures
2. To help identify performance problems with stored procedures, use the SQL
Server's Profiler Create Trace Wizard to run the "Profile the Performance of a
Stored Procedure" trace to provide you with the data you need to identify poorly
performing stored procedures. [7.0, 2000]
5. When a stored procedure is first executed (and it does not have the
WITH RECOMPILE option), it is optimized and a query plan is compiled and
cached in SQL Server's buffer. If the same stored procedure is called again from
the same connection, it will used the cached query plan instead of creating a new
one, saving time and boosting performance. This may or may not be what you
want.
If the query in the stored procedure is exactly the same each time, then this is a
good thing. But if the query is dynamic (the WHERE clauses changes from one
execution of the stored procedure to the next), then this is a bad thing, as the
query will not be optimized when it is run, and the performance of the query can
suffer greatly.
If you know that your query will vary each time it is run from the stored
procedure, you will want to add the WITH RECOMPILE option when you create the
stored procedure. This will force the stored procedure to be re-compiled each
time it is run, ensuring the query is optimized each time it is run. [6.5, 7.0,
2000]
54
6. Design your application to allow your users to cancel running queries.
Not doing so may force the user to reboot the client, which can cause
unresolvable performance problems. [6.5, 7.0, 2000]
7. Many stored procedures have the option to accept multiple parameters. This
in and of itself is not a bad thing. But what can often cause problems is if the
parameters are optional, and the number of parameters varies greatly each
time the stored procedure runs. There are two ways to handle this problem,
the slow performance way and fast performance way.
If you want to save your development time, but don't care about your
application's performance, you can write your stored procedure generically so
that it doesn't care how many parameters it gets. The problem with this method
is that you may end up unnecessarily joining tables that don't need to be joined
based on the parameters submitted for any single execution of the stored
procedure.
Another, much better performing way, although it will take you more time to
code, is to include IF...ELSE logic in your stored procedure, and create separate
queries for each possible combination of parameters that are to be submitted to
the stored procedure. This way, you can be sure you query is as efficient as
possible each time it runs. [6.5, 7.0, 2000] Added 12-29-2000
Although the above tip is a good starting point, it's not complete. The problem is
the query-plans, the pre-compilation of stored procedures, that SQL Server does
for you. As you know, one of the biggest reasons to use stored procedures
instead of ad-hoc queries is the performance gained by using them. The problem
that arises with the above tip is that SQL Server will only generate a query-plan
for the path taken through your stored procedure when you first call it, not all
possible paths.
Let me illustrate this with an example. Consider the following procedure (pre-
compilation doesn't really have a huge effect on the queries used here, but these
are just for illustration purposes):
Suppose I make my first call to this procedure with the @query parameter set to
0. The query-plan that SQL Server will generate will be optimized for the first
query ("SELECT * FROM authors"), because the path followed on the first call will
result in that query being executed.
Now, if I next call the stored procedure with @query set to 1, the query plan that
SQL Server has in memory will not be of any use in executing the second query,
since the query-plan is optimized for the authors table, not the publishers table.
Result: SQL Server will have to compile a new query plan, the one needed for the
second query. If I next call the procedure with @query set to 0 again, the whole
55
path will have to be followed from the start again, since only one query-plan will
be kept in memory for each stored procedure. This will result in sub-optimal
performance.
As it happens I have a solution, one that I've used a lot with success. It involves
the creation of what I like to call a 'delegator'. Consider again spTest. I propose
to rewrite it like this:
The result of this restructuring will be that there will always be an optimized
query-plan for spTestFromAuthors and spTestFromPublishers, since they only hold
one query. The only one getting re-compiled over and over again is the delegator,
but since this stored procedure doesn't actually hold any queries, that won't have
a noticeable effect on execution time. Of course re-compiling a plan for a simple
'SELECT *' from a single table will not give you a noticeable delay either (in fact,
the overhead of an extra stored procedure call may be bigger then the re-
compilation of "SELECT * FROM AnyTable"), but as soon as the queries get
bigger, this method certainly pays off.
The only downside to this method is that now you have to manage three stored
procedures instead of one. This is not that much of a problem though as the
different stored procedures can be considered one single 'system', so it would be
logical to keep all of them together in the same script, which would be just as
easy to edit as a single stored procedure would be. As far as security is
concerned, this method shouldn't give you any extra headaches either, as the
delegator is the only stored procedure directly called by the client, this is the only
one you need to manage permissions on. The rest will only be called by the
delegator, which will always work as long as those stored procedures are owned
by the same user as the delegator.
I've had large successes using this technique. Recently I developed a (partial full-
text) search engine for our reports database, which resulted in a stored
procedure that originally ran about 20 seconds. After employing the above
technique, the stored procedure only took about 2 seconds to run, resulting in a
ten-fold increase in performance! [6.5, 7.0, 2000] Contributed by Jeremy van
Dijk. Added 8-15-2000
56
application can actually create contention in the system tables and hurt
performance. Instead of using temporary stored procedures, you may want to
consider using the SP_EXECUTESQL stored procedure instead. It provides the
same benefits on temporary stored procedures, but it does not store data in the
systems tables, avoiding the contention problems. [7.0, 2000]
10. If you are creating a stored procedure to run in a database other than
the Master database, don't use the prefix "sp_" in its name. This special
prefix is reserved for system stored procedures. Although using this prefix will
not prevent a user defined stored procedure from working, what it can do is to
slow down its execution ever so slightly.
The reason for this is that by default, any stored procedure executed by SQL
Server that begins with the prefix "sp_", is first attempted to be resolved in the
Master database. Since it is not there, time is wasted looking for the stored
procedure.
If SQL Server cannot find the stored procedure in the Master database, then it
next tries to resolve the stored procedure name as if the owner of the object is
"dbo". Assuming the stored procedure is in the current database, it will then
execute. To avoid this unnecessary delay, don't name any of your stored
procedures with the prefix "sp_". [6.5, 7.0, 2000] Tip contributed by Joey Allen.
11. Before you are done with your stored procedure code, review it for any
unused code that you may have forgotten to remove while you were making
changes, and remove it. Unused code just adds unnecessary bloat to your stored
procedures. [6.5, 7.0, 2000] Added 8-15-2000
12. For best performance, all objects that are called within the same stored
procedure should all be owned by the same owner, preferably dbo. If they
are not, then SQL Server must perform name resolution on the objects if the
object names are the same but the owners are different. When this happens, SQL
Server cannot use a stored procedure "in-memory plan" over, instead, it must re-
execute the stored procedure, which hinders performance. [7.0, 2000] Added 10-
12-2000
13. When you need to execute a string of Transact-SQL, you should use
the sp_executesql stored procedure instead of the EXECUTE statement.
Sp_executesql offers to major advantages over EXECUTE. First, it supports
parameter substitution, which gives your more options when creating your code.
Second, it creates query execution plans that are more likely to be reused by SQL
Server, which in turn reduces overhead on the server, boosting performance.
Sp_executesql executes a string of Transact-SQL in its own self-contained batch.
When it is run, SQL Server compiles the code in the string into an execution plan
that is separate from the batch that contained the sp_executesql and its string.
Learn more about how to use sp_executesql in the SQL Server Books Online.
[7.0, 2000] Added 3-7-2001
14. SQL Server will automatically recompile a stored procedure if any of the
following happens:
· If you include a WITH RECOMPILE clause in a CREATE PROCEDURE or
EXECUTE statement.
· If you run sp_recompile for any table referenced by the stored procedure.
57
· If any schema changes occur to any of the objects referenced in the
stored procedure. This includes adding or dropping rules, defaults, and
constraints.
· New distribution statistics are generated.
· If you restore a database that includes the stored procedure or any of the
objects it references.
· If the stored procedure is aged out of SQL Server's cache.
· An index used by the execution plan of the stored procedure is dropped.
· A major number of INSERTS, UPDATES or DELETES are made to a table
referenced by a stored procedure.
· The stored procedure includes both DDL (Data Definition Language) and
DML (Data Manipulation Language) statements, and they are interleaved with
each other.
· If the stored procedure performs certain actions on temporary tables.
58
intermix both DDL and DML many times in your stored procedure, this will force a
recompilation every time it happens, hurting performance.
To prevent unnecessary stored procedure recompilations, you should include all
of your DDL statements at the first of the stored procedure so they are not
intermingled with DML statements.
Unnecessary Stored Procedure Recompilations Due to Specific Temporary Table
Operations
Improper use of temporary tables in a stored procedure can force them to be
recompiled every time the stored procedure is run. Here's how to prevent this
from happening:
· Any references to temporary tables in your stored procedure should only
refer to tables created by that stored procedure, not to temporary tables
created outside your stored procedure, or in a string executed using either
the sp_executesql or the EXECUTE statement.
· All of the statements in your stored procedure that include the name of a
temporary table should appear syntactically after the temporary table.
· The stored procedure should not declare any cursors that refer to a
temporary table.
· Any statements in a stored procedure that refer to a temporary table
should precede any DROP TABLE statement found in the stored procedure.
· The stored procedure should not create temporary tables inside a control-
of-flow statement.
17. Stored procedures can better boost performance if they are called via
Microsoft Transaction Server (MTS) instead of being called directly from your
application. A stored procedure can be reused from the procedure cache only if
the connection settings calling the stored procedure are the same. If different
connections call a stored procedure, SQL Server must load a separate copy of the
stored procedure for each connection, which somewhat defeats the purpose of
stored procedures. But if the same connection calls a stored procedure, it can be
used over and over from the procedure cache. The advantage of Transaction
Server is that it reuses connections, which means that stored procedures can be
reused more often. If you write an application where every user opens their own
connection, then stored procedures cannot be reused as often, reducing
performance. [7.0, 2000] Added 10-12-2000
19. If you use input parameters in your stored procedures, you should
validate all of them at the beginning of your stored procedure. This way, if
there is a validation problem and the client applications needs to be notified of
the problem, it happens before any stored procedure processing takes place,
59
preventing wasted effort and boosting performance. [6.5, 7.0, 2000] Added 10-
12-2000
exec myProcedure
Why? There are a couple of reasons, one of which relates to performance. First,
using fully qualified names helps to eliminate any potential confusion about which
stored procedure you want to run, helping to prevent bugs and other potential
problems. But more importantly, doing so allows SQL Server to access the stored
procedures execution plan more directly, and in turn, speeding up the
performance of the stored procedure. Yes, the performance boost is very small,
but if your server is running tens of thousands or more stored procedures every
hour, these little time savings can add up. [7.0, 2000] Added 3-7-2001 More from
Microsoft
21. If a stored procedure needs to return only a single value, and not a
recordset, consider returning the single value as an output statement.
While output statements are generally used for error-checking, they can actually
be used for any reason you like. Returning a single value as at output statement
is faster than returning a single value as part of a recordset. [6.5, 7.0, 2000]
Added 8-1-2001
60