Professional Documents
Culture Documents
*****
If you need to store large strings of data, and they are less than 8,000 characters, use a
VARCHAR data type instead of a TEXT data type. TEXT data types have extra overhead that
drag down performance. [2000, 2005, 2008] Updated 2-3-2009
*****
Don't use the NVARCHAR or NCHAR data types unless you need to store 16-bit character
(Unicode) data. They take up twice as much space as VARCHAR or CHAR data types,
increasing server I/O and wasting unnecessary space in your buffer cache. [2000, 2005, 2008]
Updated 2-3-2009
*****
If the text data in a column varies greatly in length, use a VARCHAR data type instead of a
CHAR data type. The amount of space saved by using VARCHAR over CHAR on variable
length columns can greatly reduce I/O reads cache memory used to hold data, improving overall
SQL Server performance.
Another advantage of using VARCHAR over CHAR columns is that sorts performed on
VARCHAR columns are generally faster than on CHAR columns. This is because the entire
width of a CHAR column needs to be sorted. [2000, 2005, 2008] Updated 2-3-2009
*****
If a column's data does not vary widely in length, consider using a fixed-length CHAR field
instead of a VARCHAR. While it may take up a little more space to store the data, processing
fixed-length columns is faster in SQL Server than processing variable-length columns. [2000,
2005, 2008] Updated 2-3-2009
*****
Always choose the smallest data type you need to hold the data you need to store in a column.
For example, if all you are going to be storing in a column are the numbers 1 through 10, then
the TINYINT data type is more appropriate that the INT data type. The same goes for CHAR
and VARCHAR data types. Don’t specify more characters in character columns that you need.
This allows you to store more rows in your data and index pages, reducing the amount of I/O
needed to read them. It also reduces the amount of data moved from the server to the client,
reducing network traffic and latency. And last of all, it reduces the amount of wasted space in
your buffer cache. [2000, 2005, 2008] Updated 2-3-2009
*****
If you have a column that is designed to hold only numbers, use a numeric data type, such as
INTEGER, instead of a VARCHAR or CHAR data type. Numeric data types generally require
less space to hold the same numeric value as does a character data type. This helps to reduce the
size of the columns, and can boost performance when the columns is searched (WHERE clause),
joined to another column, or sorted. [2000, 2005, 2008] Updated 2-3-2009
*****
Avoid using FLOAT or REAL data types for primary keys, as they add unnecessary overhead
that hurts performance. Use one of the integer data types instead. [2000, 2005, 2008] Updated 2-
3-2009
*****
When specifying data types during table creation, always specify NULL or NOT NULL for each
column. If you don't, then the column will default to NOT NULL if the ANSI NULL DEFAULT
database option is not selected (the default), and will default to NULL of the ANSI NULL
DEFAULT database option is selected.
For best performance, and to reduce potential code bugs, columns should ideally be set to NOT
NULL. For example, use of the IS NULL keywords in the WHERE clause makes that portion of
the query non-sargeble, which means that portion of the query cannot make good use an index.
[2000, 2005, 2008] Updated 2-3-2009
*****
If you are using fixed length columns (CHAR, NCHAR) in your table, consider avoiding storing
NULLs in them. If you do, the entire amount of space dedicated to the column will be used up.
For example, if you have a fixed length column of 255 characters, and if you place a NULL in it,
then 255 characters have to be stored in the database. This is a large waste of space that will
cause SQL Server to have to perform extra disk I/O to read data pages. It also wastes space in the
data cache buffer. Both of these contribute to reduced SQL Server performance.
Instead of using NULLs, use a coding scheme similar to this in your databases:
Such a scheme provides the benefits of using NULLs, but without the drawbacks.
If you really must use NULLs, use a variable length column instead of a fixed length column.
Variable length columns only use a very small amount of space to store a NULL. [2000, 2005,
2008] Updated 2-3-2009
If you use the CONVERT function to convert a value to a variable length datatype, such as
VARCHAR, always specify the length of the variable datatype. If you do not, SQL Server
assumes a default length of 30. Ideally, you should specify the shortest length to accomplish the
required task. This helps to reduce memory use and SQL Server resources. [2000, 2005, 2008]
Updated 2-3-2009
*****
Generally, using computed columns in a table is not recommended because it does not follow the
standard rules of normalization. But, it is sometimes more efficient overall to use computed
columns in a table rather than re-computing the same data repeatedly in queries. This is
especially true if you are running the same query over and over against your data that performs
the same calculations over and over. By performing the calculations in the table, it can reduce the
amount of work performed by a query each time it is run. You have to determine for yourself
where the bottleneck in performance is, and act accordingly. If the bottleneck is in INSERTS and
UPDATES, then using calculated columns may not be a good idea. But if your SELECT
statements are the bottleneck, then using calculated columns may pay off. [2000, 2005, 2008]
Updated 2-3-2009
*****
Avoid using the bigint data type unless you really need its additional storage capacity. The bigint
data type uses 8 bytes of memory verses 4 bytes for the int data type. [2000, 2005, 2008]
Updated 2-3-2009
*****
Avoid using the SQL Server sql_variant datatype. Besides being a performance hog, it
significantly affects what you can do with the data stored as a sql_variant. For example,
sql_variant columns cannot be a part of primary or foreign keys, can be used in indexes and
unique keys if they are shorter than 900 bytes, cannot have an identity property, cannot be part of
a computed column, must convert the data to another datatype when moving data to objects with
other datatypes, are automatically converted to nvarchar(4000) when accessed by client
applications using the SQL Server 7.0 OLE DB or ODBC providers, are not supported by the
LIKE predicate in the WHERE clause, cannot be concatenated, and don't work with some
functions. [2000, 2005, 2008] Updated 2-3-2009
*****
Avoid using date data types as a primary key. From a performance perspective, it is more
efficient to use a data type that uses less space. For example, the DATETIME datatype uses 8
bytes of space, while the INT datatype only takes up 4 bytes. The less space used, the smaller the
table and index, and the less I/O overhead that is required to access the primary key. [2000,
2005, 2008] Updated 2-3-2009
*****
If you are creating a column that you know will be subject to many sorts, consider making the
column integer-based and not character-based. This is because SQL Server can sort integer data
faster than character data. [2000, 2005, 2008] Updated 2-3-2009
*****
Take care when using Unicode data in your queries, as it can affect query performance. A classic
problem is related to an application passing in Unicode literals, while the column searched in the
database table is non-Unicode. This, of course, may be visa-versa depending on your scenario.
Here is an example. The DB column "orgname_name" has been indexed, and is of type varchar.
The code below performs OK (so we think) performing an index scan operation:
|--Index Scan(OBJECT:([corpsys].[dbo].[Organisation_Name].[Organisation_Name]),
WHERE:(Convert([Organisation_Name].[orgname_name])=[@myvar]))
Table 'Organisation_Name'.
Scan count 1,
physical reads 0,
read-ahead reads 0.
If we change this around slightly, using a varchar variable instead (no explicit Unicode
conversion) we see this:
declare @myvar varchar(200)
|--Index Seek(OBJECT:([corpsys].[dbo].[Organisation_Name].[Organisation_Name_nameix]),
Table 'Organisation_Name'.
Scan count 1,
logical reads 9,
physical reads 0,
read-ahead reads 0.
Instead of 1,145 logical reads, there is only 9, a significant improvement. [7.0, 2000, 2005]
Updated 10-16-2005. Contributed by www.chriskempster.com.
SQL Profiler can be used to monitor and log the workload of a SQL Server. This logged
workload can then be submitted to the SQL Server Index Tuning Wizard so index
changes can be made to help performance if necessary. SQL Profiler and Index Tuning
Wizard help administrators achieve optimal indexing. Using these tools periodically will
keep SQL Server performing well, even if the query workload changes over time.
SQL Server includes a complete set of System Monitor objects and counters to provide
information for monitoring and analyzing the operations of SQL Server. This document
describes key counters to watch.
SQL Server can slow down and become unreliable as the number of users increase or the size of
the database grows. The first instinct is to simply throw more money at the server hardware.
With the database design as the root cause however, this usually results in minimal, incremental
improvements in database performance. IT IS ALMOST ALWAYS THE CASE that deficient
database design issues are the root-cause of poor performance and poor data integrity, which can
include one or more of the following:
PCA has extensive experience tuning SQL Server applications for optimal performance and
reliability. Our expert database engineers can find the SQL Server slowdowns and apply proven
remedial database design techniques to substantially increase database performance and improve
data integrity.
If your SQL Server database solution is up and running, but you are experiencing significant data
reliability and/or performance problems, PCA can help you discover the underlying design
and/or implementation causes, and provide proven SQL Server Performance Optimization
techniques to improve overall system performance and reliability.
Don't think that performance tuning your SQL Server applications is regulated to the end of
the development process. If you want your SQL Server-based applications to scale and run
at their full potential, you must begin considering scalability and performance issues during
the early stages of your application's development.
If you have been a DBA or SQL developer for very long, then you have probably run across
some slow SQL Server-based applications. And often when this happens, everybody begins
blaming everybody else for the problem. It's the network. It's the server hardware. It's SQL
Server. It's the users. It's the database. And this goes on and on, but unfortunately, blame
doesn't fix slow applications. The cure to most slow SQL Server-based applications is
prevention, which includes careful up front analysis of the user's needs, thoughtful design,
optimal coding, and appropriate implementation.
For any application, SQL Server or otherwise, scalability and performance have to be built in
from the very beginning. Once the application is rolled out, it is very difficult and expensive
to resolve most scalability and performance issues.
In this article you are going to learn the fundamentals of how to design, code, and
implement scalable and performance optimized SQL Server applications. You won't learn
everything, as that would take an entire book. The focus of this article is on learning the
very minimum you must know in order to produce scalable and performance tuned SQL
Server-based applications. Here's what you will learn:
What Every Developer and DBA Must Know About SQL Server Performance Tuning
At the very least, if you take advantage of the advice and information in this article, you will
find that performance tuning your SQL Server-related applications is not a big as mystery as
you might think. So let's get to work.
What Every Developer Must Know About SQL Server Performance Tuning
As a developer, there are some overriding principals on which to proceed. This section
introduces these principals. Keep these in mind as you read about specific performance
tuning details discussed later in this article, and whenever performance tuning your SQL
Server applications.
While it is virtually impossible to control every factor that influences SQL Servers' scalability
and performance, what you can do is make the most of what you can control.
When performing tests, always proceed scientifically, only testing one dependant variable at
a time. For example, if you suspect that you need to add an index to a table to boost
performance, but you are not sure which one, or of what type is best, experiment with only
one change at a time, testing each change individually to see if it produces the results you
expect. If you change more than one thing at a time, you won't know which change you
made worked or didn't work. This goes for all testing, whether it is adding indexes, making
SQL Server configuration changes, or testing various hardware configurations.
Always try to test under realistic conditions. This means use "real" data, testing against the
largest expected data sets, and using hardware similar to the hardware that will be used
when the application goes into production. If you don't, you may be surprised that while
your application works well for 10 simultaneous users during testing, that it fails miserably
when 500 users are online.
For example, you need to fully understand the programming language used to write your
application, database design, application design, Transact-SQL, how SQL Server stores and
indexes data, and how networks and server hardware really work. The better understanding
you have of the basics of the applicable technologies used to develop and roll out your
application, the better position you will be in to understand what is causing performance
and scalability problems and how to resolve them. Learn all you can.
The reason for this is that most slow applications are slow because of poor up front design,
not because of slow hardware. The reason hardware is often blamed is because
performance problems often don't show themselves until after the application is rolled out.
And since the application's design can't be changed at this time, about the only thing you
can try to help boost performance is to throw hardware at it. While hardware can help, it
usually doesn't fully resolve the problem, and this is why hardware is often blamed for slow
performance. While hardware can sometimes be an issue, most likely it is not.
In order to prevent your server hardware from being a drag on your SQL Server-based
application (which it can if it is inappropriately selected or configured), let's take a brief look
at some of the most common hardware selection and tuning issues.
Selecting Hardware
Selecting the optimum hardware for your SQL Server-based application depends on a
variety of factors, such as the size of the database, the number of users, how the database
is used (OLTP or OLAP), and others. While there is no sure-fire formula for sizing server
hardware, the best way to get a feel for sizing is to test your application early in the
development stage. Ah, testing is mentioned again. That's right. While many experienced
DBAs can probably give you a good estimate on the optimum hardware you need, only
through realistic testing will you know for sure what hardware is required to meet your
application's needs.
When is comes to server hardware, here are some things to keep in mind:
CPU: Always purchase a server with the ability to expand its number of CPUs. For
example, if testing indicates that a single CPU server will be adequate, purchase a
server with at least room for two CPUs, even if you only use one of the slots. The
same goes for larger servers with four or more CPUs. Always leave room for growth.
Memory: This is probably the most significant piece of hardware that affects SQL
Server's performance. Ideally, your entire database should fit into RAM.
Unfortunately, this is not often possible. At the very minimum, try to get enough
RAM to hold the largest table you expect to have, and if you can afford it, get all the
RAM your server can handle, which is often 2GB or more. There is no such thing as
too much RAM.
I/O Subsystem: After RAM, the I/O subsystem is the most important piece of
hardware to affect SQL Server's performance. At the very minimum, purchase
hardware-based RAID for your databases. As a rule of thumb, you will to purchase
more, smaller drives, not fewer, larger drives in your array. The more disks that are
in an array, the faster I/O will be.
Network Connection: At the server, have at least one 100Mbs network card, and it
should be connected to a switch. Ideally, you should have two network cards in the
server connected to a switch in full-duplex mode.
For best performance on a server, SQL Server should be the only application running on the
server, other than management utilities. Don't try to save a few bucks by putting your IIS
or MTS server on the same server as SQL Server. Not only does this hurt SQL Server's
performance, but it also makes it more difficult to performance tune and troubleshoot SQL
Server.
For the most part, SQL Server is self-tuning. What does this mean? It means that SQL
Server observes what is running on itself, and automatically makes internal adjustments
which, for the most part, keep SQL Server running as optimally as possible given the tasks
at hand and the given hardware.
When you perform performance testing on SQL Server, keep in mind that SQL Server can
take some time before it adjusts itself optimally. In other words, the performance you get
immediately after starting the SQL Server service, and the performance you get a couple of
hours later after a typical workload has been running, can be different. Always perform your
testing after SQL Server has had a chance to adjust itself to your workload.
There are 36 SQL Server configuration options that can be changed using either the
Enterprise Manager or the sp_configure stored procedure. Unless you have a lot of
experience tuning SQL Server, I don't recommend you change any of SQL Server's settings.
As a novice, you may make a change that could in fact reduce performance. This is because
when you change a setting, you are "hard-coding" the setting from then on. SQL Server has
the ability to change its setting on the fly, based on the current workload. But once you
"hard-code" a setting, you partially remove SQL Server's ability to self-tune itself.
If after serious consideration you feel that making a change to one or more SQL Server
configuration settings can boost performance in your particular environment, then you will
want to proceed slowly and cautiously. Before you make the setting change, you will first
want to get a good baseline on the SQL Server's performance, under a typical workload,
using a tool such as Performance Monitor (discussed later). Then make only one change at a
time. Never make more than one change at a time, because if you do, you won't know
which change, if any of them, made a difference.
Once the one change is made, again measure SQL Server's performance under the same
workload to see if performance was actually boosted. If it wasn't, which will often be the
case, then change back to the default setting. If there was a performance boost, then
continue to check to see if the boost in performance continues under other workloads the
server experiences over time. Your later testing may show that your change helps some
workloads, but hinders others. This is why changing most configuration settings is not
recommended.
In any event, if your application is suffering from a performance-related issue, the odds of a
configuration change resolving it are quite low.
One of the first steps you must decide when designing an n-tier application is to select the
logical and physical design. Of the two, the physical design is where most of the mistakes
are made when it comes to performance. This is because this is where the theory (based on
the logical design) has to be implemented in the real world. And just like anything else, you
have many choices to make. And many of these choices don't lend themselves to scalability
or high performance.
For example, do you want to implement a physical two-tier implementation with fat clients,
a physical two-tier implementation with a fat server, a physical three-tier implementation,
an Internet implementation, or some other implementation? Once you decide this question,
then you must ask yourself, what development language will be used, what browser, will
you use Microsoft Transaction Server (MTS), will you use Microsoft Message Queue Server
(MSMQ), and on and on.
Each of these many decisions can and will affect performance and scalability. Because there
are so many options, it is again important to test potential designs early in the design stage,
using rapid prototyping, to see which implementation will best meet your user's needs.
More specifically, as you design your physical implementation, try to follow these general
recommendations to help ensure scalability and optimal performance in your application:
Don't maintain state (don't store data from the database) in the business services
tier. Maintain state in the database as much as possible
Don't create complex or deep object hierarchies. The creation and use of complex
classes or a large number of objects used to model complex business rules can be
resource intensive and reduce the performance and scalability of your application.
This is because the memory allocation when creating and freeing these objects is
costly.
If your application runs queries against SQL Server that by nature are long, design
the application to be able to run queries asynchronously. This way, one query does
not have to wait for the next before it can run. One way to build in this functionality
into your n-tier application is to use the Microsoft Message Queue Server (MSMQ).
While following these suggestions won't guarantee a scalable and fast performing
application, they are a good first start.
As always, you will want to test your design as early as possible using realistic data. This
means you will need to develop prototype databases with sample data, and test the design
using the type of activity you expect to see in the database once production starts.
One of the first design decisions you must make is whether the database will be used for
OLTP or OLAP. Notice that I said "or". One of the biggest mistakes you can make when
designing a database is to try to meet the needs of both OLTP and OLAP. These two types of
applications are mutually exclusive in you are interested in any sense of high performance
and scalability.
OLTP databases are generally highly normalized, helping to reduce the amount of data that
has to be stored. The less data you store, the less I/O SQL Server will have to perform, and
the faster database access will be. Transactions are also kept as short as possible in order
to reduce locking conflicts. And last of all, indexing is generally minimized to reduce the
overhead of high levels of INSERTs, UPDATEs, and DELETEs.
OLAP databases, on the other hand, are highly de-normalized. In addition, transactions are
not used, and because the database is read-only, record locking is not an issue. And of
course, heavy indexing is used in order to meet the wide variety of reporting needs.
As you can see, OLTP and OLAP databases serve two completely different purposes, and it is
virtually impossible to design a database to handle both needs. While OLAP database design
is out of this book's scope, I do want to mention a couple of performance-related
suggestions in regard to OLTP database design.
When you go through the normalization process when designing your OLTP databases, your
initial goal should be to fully normalize it according to the three general principles of
normalization. The next step is to perform some preliminary performance testing, especially
if you foresee having to perform joins on four or more tables at a time. Be sure to test using
realistic sample data.
If performance is acceptable, then don't worry about having to join four or more tables in a
query. But if performance is not acceptable, then you may want to do some selective de-
normalization of the tables involved in order to reduce the number of joins used in the
query, and to speed performance.
It is much easier to catch a problem in the early database design stage, rather than after
the finished application has been rolled out. De-normalization of tables after the application
is complete is nearly impossible. One word of warning. Don't be tempted to de-normalize
your database without thorough testing. It is very hard to deduce logically what de-
normalization will do to performance. Only through realistic testing can you know for sure if
de-normalization will gain you anything in regards to performance.
How your code your application has a significant bearing on performance and scalability,
just as the database design and the overall application design affect performance and
scalability. Sometimes, something as simple as choosing one coding technique over another
can make a significant different. Rarely is there only one way to code a task, but often there
is only one way to code a task for optimum performance and scalability.
What I want to do in this section is focus on some essential techniques that can affect the
performance of your application and SQL Server.
Since I don't know what development language you will be using, I am going to assume
here that you will be using Microsoft's ADO (Active Data Objects) object model to access
SQL Server from your application. The examples I use here should work for most Visual
Basic and ASP developers. So let's just dive in and look at some specific techniques you
should implement in your application code when accessing SQL Server data to help ensure
high performance.
The easiest way to manipulate data from your application is to use ADO's various methods,
such as rs.AddNew, rs.Update, or rs.Delete. While using these methods is easy to learn and
implement, you pay a relatively steep penalty in overhead for using them. ADO's methods
often create slow cursors and generate large amounts of network traffic. If your application
is very small, you would never notice the difference. But if your application has much data
at all, your application's performance could suffer greatly.
Another way to manipulate data stored in SQL Server using ADO is to use dynamic SQL
(also sometimes referred to as ad hoc queries). Here, what you do is send Transact-SQL in
the form of strings from ADO in your application to be run on SQL Server. Using dynamic
SQL is generally much faster than using ADO's methods, although it does not offer the
greatest performance. When SQL Server receives the dynamic SQL from your ADO-based
application, it has to compile the Transact-SQL code, create a query plan for it, and then
execute it. Compiling the code and creating the query plan the first time takes a little
overhead. But once the Transact-SQL code has been compiled and a query plan created, it
can be reused over and over assuming the Transact-SQL code sent later is nearly identical,
which saves overhead.
For optimal performance, you will want to use ADO to called stored procedures on your
server to perform all your data manipulation. The advantages of stored procedures are
many. Stored procedures are already pre-compiled and optimized, so this step doesn't have
to be repeated every time the stored procedure is run. The first time a stored procedure is
run, a query plan is created and stored in SQL Server's memory, so it can be reused, saving
even more time. Another benefit of stored procedures is that they help reduce network
traffic and latency. When your application's ADO code calls a stored procedure on SQL
Server, it makes a single network call. Then any required data processing is performed on
SQL Server, where data processing is most efficiently performed, and then if appropriate, it
will return any results to your application. This greatly reduces network traffic and increases
scalability and performance.
While stored procedures handle basic data manipulation like a champ, they can also handle
much more very well. Stored procedures can run virtually any Transact-SQL code, and since
Transact-SQL code is the most efficient way to manipulate data, all of your application's
data manipulations should be done inside of stored procedures on SQL Server, not in COM
components in the business-tier or on the client.
When you use ADO to execute stored procedures on SQL Server, you have two major ways
to proceed. You can use ADO to call the Refresh method of the Parameters collection in
order to save you a little coding. ADO needs to know what parameters are used by the
stored procedure, and the Refresh method can query the stored procedure on SQL Server to
find out the parameters. But as you might expect, this produces additional network traffic
and overhead. While it takes a little more coding, a more efficient way to call a SQL Server
stored procedure is to create the parameters explicitly in your code. This eliminates the
extra overhead caused by the Refresh method and speeds up your application.
For optimum performance, COM objects should be compiled as in-process DLLs (which is
required if they are to run under MTS). You should always employ early binding when
referencing COM objects, and create them explicitly, not implicitly.
Choosing the appropriate data types can affect how quickly SQL Server can SELECT,
INSERT, UPDATE, and DELETE data, and choosing the most optimum data type is not
always obvious. Here are some suggestions you should implement when creating physical
SQL Server tables to help ensure optimum performance.
Always choose the smallest data type you need to hold the data you need to store in
a column. For example, if all you are going to be storing in a column are the
numbers 1 through 10, then the TINYINT data type is more appropriate that the INT
data type. The same goes for CHAR and VARCHAR data types. Don't specify more
characters for character columns that you need. This allows SQL Server to store
more rows in its data and index pages, reducing the amount of I/O needed to read
them. Also, it reduces the amount of data moved from the server to the client,
reducing network traffic and latency.
If the text data in a column varies greatly in length, use a VARCHAR data type
instead of a CHAR data type. Although the VARCHAR data type has slightly more
overhead than the CHAR data type, the amount of space saved by using VARCHAR
over CHAR on variable length columns can reduce I/O, improving overall SQL Server
performance.
Don't use the NVARCHAR or NCHAR data types unless you need to store 16-bit
character (Unicode) data. They take up twice as much space as VARCHAR or CHAR
data types, increasing server I/O overhead.
If you need to store large strings of data, and they are less than 8,000 characters,
use a VARCHAR data type instead of a TEXT data type. TEXT data types have extra
overhead that drag down performance.
If you have a column that is designed to hold only numbers, use a numeric data
type, such as INTEGER, instead of a VARCHAR or CHAR data type. Numeric data
types generally require less space to hold the same numeric value as does a
character data type. This helps to reduce the size of the columns, and can boost
performance when the columns is searched (WHERE clause) or joined to another
column.
Keep the code in your triggers to the very minimum to reduce overhead. The more
code that runs in the trigger, the slower each INSERT, UPDATE, and DELETE that
fires it will be.
Don't use triggers to perform tasks that can be performed using more efficient
techniques. For example, don't use a trigger to enforce referential integrity if SQL
Server's built-referential integrity is available to accomplish your goal. The same
goes if you have a choice between using a trigger or a CHECK constraint to enforce
rules or defaults. You will generally want to choose a CHECK constraint as they are
faster than using triggers when performing the same task.
Try to avoid rolling back triggers because of the overhead involved. Instead of letting
the trigger find a problem and rollback a transaction, catch the error before it can
get to the trigger (if possible based on your code). Catching an error early (before
the trigger fires) consumes fewer server resources than letting the trigger roll back.
Don't return more columns or rows of data to the client than absolutely necessary.
This just increases disk I/O on the server and network traffic, both of which hurts
performance. In SELECT statements, don't use SELECT * to return rows, always
specify in your SELECT statement exactly which columns are needed to be returned
for this particular query, and not a column more. In most cases, be sure to include a
WHERE clause to reduce the number or rows sent to only those rows the clients
needs to perform the task immediately at hand.
If your application allows users to run queries, but you are unable in your application
to easily prevent users from returning hundreds, even thousands of unnecessary
rows of data they don't need, consider using the TOP operator within the SELECT
statement. This way, you can limit how may rows are returned, even if the user
doesn't enter any criteria to help reduce the number or rows returned to the client.
If you need to perform row-by-row operations, try to find another method to perform the
task. Some options are to perform row-by-row tasks at the client instead of the server,
using tempdb tables at the server, or using a correlated sub-query.
Unfortunately, these are not always possible, and you have to use a cursor. If you find it
impossible to avoid using cursors in your applications, then perhaps one of these
suggestions will help.
SQL Server offers you several different types of cursors, each with its different
performance characteristics. Always select the cursor with the least amount of
overhead that has the features you need to accomplish your goals. The most efficient
cursor you can choose is the fast forward-only cursor.
When using a server-side cursor, always try to fetch as small a result set as possible.
This includes fetching only those rows and columns the client needs immediately.
The smaller the cursor, no matter what type of server-side cursor it is, the fewer
resources it will use, and performance will benefit.
When you are done using a cursor, don't just CLOSE it, you must also DEALLOCATE
it. Deallocation is required to free up the SQL Server resources used by the cursor. If
you only CLOSE the cursor, locks are freed, but SQL Server resources are not. If you
don't DEALLOCATE your cursors, the resources used by the cursor will stay allocated,
degrading the performance of your server until they are released.
If you have two or more tables that are frequently joined together, then the columns
used for the joins should have an appropriate index. If the columns used for the joins
are not naturally compact, then considering adding surrogate keys to the tables that
are compact in order to reduce the size of the keys, thus decreasing read I/O during
the join process, and increasing overall performance. You will learn more about
indexing in the next section of this article.
For best performance, the columns used in joins should be of the same data types.
And if possible, they should be numeric data types rather than character types.
Avoid joining tables based on columns with few unique values. If columns used for
joining aren't mostly unique, then the SQL Server optimizer will perform a table scan
for the join, even if an index exists on the columns. For best performance, joins
should be done on columns that have unique indexes.
If you have to regularly join four or more tables to get the recordset you need,
consider denormalizing the tables so that the number of joined tables is reduced.
Often, by adding one or two columns from one table to another, joins can be
reduced.
When a stored procedure is first executed (and it does not have the WITH RECOMPILE
option specified), it is optimized and a query plan is compiled and cached in SQL Server's
memory. If the same stored procedure is called again, it will use the cached query plan
instead of creating a new one, saving time and boosting performance. This may or may not
be what you want. If the query in the stored procedure is the same each time, then this is a
good thing. But if the query is dynamic (the WHERE clauses changes substantially from one
execution of the stored procedure to the next), then this is a bad thing, as the query will not
be optimized when it is run, and the performance of the query can suffer.
If you know that your query will vary each time it is run from the stored procedure, you will
want to add the WITH RECOMPILE option when you create the stored procedure. This will
force the stored procedure to be re-compiled each time it is run, ensuring the query is
optimized each time it is run.
Always include in your stored procedures the statement, "SET NOCOUNT ON". If you don't
turn this feature on, then every time a SQL statement is executed, SQL Server will send a
response to the client indicating the number of rows affected by the statement. It is rare
that the client will ever need this information. Using this statement helps reduce the traffic
between the server and the client.
Deadlocking can occur within a stored procedure when two user processes have locks on
separate objects and each process is trying to acquire a lock on the object that the other
process has. When this happens, SQL Server ends the deadlock by automatically choosing
one and aborting the process, allowing the other process to continue. The aborted
transaction is rolled back and an error message is sent to the user of the aborted process.
To help avoid deadlocking in your SQL Server application, try to design your application
using these suggestions: 1) have the application access server objects in the same order
each time; 2) during transactions, don't allow any user input. Collect it before the
transaction begins; 3) keep transactions short and within a single batch, and 4) if
appropriate, use as low of an isolation level as possible for the user connection running the
transaction.
In this section we will take a brief look at how to answer the above questions.
Unfortunately, there is no absolute answer for every occasion. Like much of SQL Server
performance tuning and optimization, you may have to do some experimenting to find the
ideal indexes. So let's begin by looking as some general index creation guidelines, then we
will take a more detailed look at selecting clustered and non-clustered indexes.
As a general rule of thumb, don't automatically add indexes to a table because it seems like
the right thing to do. Only add indexes if you know that they will be used by the queries run
against the table. If you don't know what queries will be run against your table, then don't
add any indexes until you know for sure. It is too easy to make a guess on what queries will
be run, create indexes, and then later find out your guesses were wrong. You must know
the type of queries that will be run against your data, and then these need to be analyzed
to determine the most appropriate indexes, and then the indexes must be created and
tested to see if they really help or not.
The problem of selecting optimal indexes is often difficult for OLTP applications because they
tend to experience high levels of INSERT, UPDATE, and DELETE activity. While you need
good indexes to quickly locate records that need to be SELECTED, UPDATED, or DELETED,
you don't want every INSERT, UPDATE, or DELETE to result in too much overhead because
you have too many indexes. On the other hand, if you have an OLAP application that is
virtually read-only, then adding as many indexes as you need is not a problem because you
don't have to worry about INSERT, UPDATE, or DELETE activity. As you can see, how your
application is used makes a large difference in your indexing strategy.
Another thing to think about when selecting indexes is that the SQL Server Query Optimizer
may not use the indexes you select. If the Query Optimizer chooses not to use your
indexes, then they are a burden on SQL Server and should be deleted. So how come the
SQL Server Query Optimizer won't always use an index if one is available?
This is too large a question to answer in detail here, but suffice to say, sometimes it is
faster for SQL Server to perform a table scan on a table than it is to use an available index
to access data in the table. Two reasons that this may happen is because the table is small
(not many rows), or if the column that was indexed isn't at least 95% unique. How do you
know if SQL Server won't use the indexes you create? We will answer this question a little
later when we take a look at how to use the SQL Server Query Analyzer later in this article.
Clustered indexes are ideal for queries that select by a range of values or where you
need sorted results. This is because the data is already presorted in the index for
you. Examples of this include when you are using BETWEEN, <, >, GROUP BY,
ORDER BY, and aggregates such as MAX, MIN, and COUNT in your queries.
Clustered indexes are good for queries that look up a record with a unique value
(such as an employee number) and when you need to retrieve most or all of the data
in the record. This is because the query is covered by the index.
Clustered indexes are good for queries that access columns with a limited number of
distinct values, such as a columns that holds country or state data. But if column
data has little distinctiveness, such as columns with a yes or no, or male or female,
then these columns should not be indexed at all.
Clustered indexes are good for queries that use the JOIN or GROUP BY clauses.
Clustered indexes are good for queries where you want to return a lot of rows, just
not a few. This is because the data is in the index and does not have to be looked up
elsewhere.
Avoid putting a clustered index on columns that increment, such as an identity, date,
or similarly incrementing columns, if your table is subject to a high level of INSERTS.
Since clustered indexes force the data to be physically ordered, a clustered index on
an incrementing column forces new data to be inserted at the same page in the
table, creating a table hot spot, which can create disk I/O bottlenecks. Ideally, find
another column or columns to become your clustered index.
What can be frustrating about the above advice is that there might be more than one
column that should be clustered. But as we know, we can only have one clustered index per
table. What you have to do is evaluate all the possibilities (assuming more than one column
is a good candidate for a clustered index) and then select the one that provides the best
overall benefit.
Non-clustered indexes are best for queries that return few rows (including just one
row) and where the index has good selectivity (above 95%).
If a column in a table is not at least 95% unique, then most likely the SQL Server
Query Optimizer will not use a non-clustered index based on that column. Because of
this, don't add non-clustered indexes to columns that aren't at least 95% unique. For
example, a column with "yes" or "no" as the data won't be at least 95% unique.
Keep the "width" of your indexes as narrow as possible, especially when creating
composite (multi-column) indexes. This reduces the size of the index and reduces
the number of reads required to read the index, boosting performance.
If possible, try to create indexes on columns that have integer values instead of
characters. Integer values have less overhead than character values.
If you know that your application will be performing the same query over and over
on the same table, consider creating a covering index on the table. A covering index
includes all of the columns referenced in the query. Because of this, the index
contains the data you are looking for and SQL Server doesn't have to look up the
actual data in the table, reducing logical and/or physical I/O. On the other hand, if
the index gets too big (too many columns), this can increase I/O and degrade
performance.
An index is only useful to a query if the WHERE clause of the query matches the
column(s) that are leftmost in the index. So if you create a composite index, such as
"City, State", then a query such as "WHERE City = 'Houston'" will use the index, but
the query "WHERE STATE = 'TX'" will not use the index.
Generally, if a table needs only one index, make it a clustered index. If a table needs more
than one index, then you have no choice but to use non-clustered indexes. By following the
above recommendations, you will be well on your way to selecting the optimum indexes for
your tables.