You are on page 1of 298

Sybase ASE 12.

5 Performance and Tuning


Jeffrey Garbus
Eric Miner
Joel DuPlessis
Alvin Chang
Yuval Malchi

Wordware Publishing, Inc.

Sybase is a registered trademark of Sybase, Inc.

Other product names mentioned are used for identification purposes only and may be
trademarks of their respective companies.

All inquiries for volume purchases of this book should be addressed to Wordware Publishing,
Inc., at the above address. Telephone inquiries may be made by calling:

(972) 423-0090

Dedications

For my love
Jeff Garbus

To my mother, who's always wanted to see my name on a book; to my father, who still wants
to know what I do for a living; and to my wife and son, for putting up with me while I went
through the stress of deadlines.
Eric Miner

To all the friends and colleagues throughout the years who have given me opportunity to
laugh, cause to think, and reason to wonder.
Joel DuPlessis

For my family... For my friends...


Alvin Chang

To my parents, who were insanely brave to immigrate to the United States 15 years ago in
their mid-40s, with four kids and ten pieces of luggage. And to the Goldins, for always being
there.
Yuval Malchi

About the Authors

Jeff Garbus's background includes a bachelor's degree from Rensselaer Polytechnic Institute,
and work experience from PCs to mainframes and back again. He has many years of
Microsoft and Sybase SQL Server experience, with a special emphasis on assisting clients in
migrating from existing systems to pilot and large-scale projects. He is well known in the
industry, having spoken at user conferences and to user groups for many years, written
articles and columns for many magazines nationally and internationally, as well as having
written several books. Recently his focus has been on very large databases, data warehousing,
training, and remote database administration. Jeff has been in consulting for fourteen years,
training for nine, and in the software business for eighteen. He has a demonstrated talent for
staying at the leading edge of technology, as well as transferring his knowledge to others. Jeff
currently is CEO of Soaring Eagle Consulting, an RDBMS consulting and training firm based
in Tampa, which specializes in solving business problems as well as performance problems.
Soaring Eagle now offers remote database administration services.

Eric Miner has been with Sybase since October 1992, starting as an associate technical
support engineer. After a couple of years on the phones working with customers, he began to
study the optimizer, and moved through the ranks of Technical Support and into the 'back line'
work of Product Support Engineering. After leaving Sybase for nine months-and realizing that
the grass isn't always greener on the other side-he returned and joined the Optimizer Group in
ASE Engineering. Eric has spoken at four Techwave conferences, usually giving optimizer-
related presentations. He has written several optimizer-related articles for the ISUG Technical
Journal and has spoken to various ISUG groups in Europe. He has contributed to the Sybase
Performance and Tuning course offered by Sybase Education and conducted many internal
training sessions for both support and consulting staff, as well as made technical presentations
on site for customers and spoken on two webcasts. As an escape from work, Eric does digital
stereo photography and is very involved in the online community of stereo photographers
worldwide.

Joel DuPlessis is a senior database consultant working in the New York metropolitan area.
Eclectic by nature, he holds advanced degrees in history and business and has been working
in information technology since 1983. During his tenure he has worked on a wide variety of
databases, both relational and non-relational.

Alvin Chang is a technical trainer, author, and consultant working for Soaring Eagle
Consulting, a Tampa-based consulting and training firm. He has taught Adaptive Server
Enterprise throughout the United States. Currently specializing in Sybase Adaptive Server and
Microsoft SQL Server System Administration, Alvin started as a technical trainer for products
like Microsoft Office, Lotus SmartSuite, and Lotus Notes before moving into RDBMSs. This
is his sixth book. He can be reached at alvin@soaringeagleltd.com.

Yuval Malchi is a database administrator working for Investment Technology Group (ITG)
Israel. Yuval currently manages ITG Europe's databases throughout its European offices.
Prior to joining the office in Israel, Yuval worked at ITG headquarters in New York. He also
worked for Sybase Professional Services and managed several large projects while there,
involving tens of thousands of users, database design of major systems, and performance
tuning assignments. Yuval's current field of expertise is in performance tuning of ASE
systems and SQL. He currently serves as the president of ISUG Israel and lectures at Sybase
Israel conferences.

Acknowledgments

First, I'd like to thank Celeste Noren and Dale Engle for having the vision to pursue a new
ASE book. I'd like to thank all of the people whose time and support helped make this
possible: Jim Hill and Wes Beckwith at Wordware Publishing; Omkar Bhongir and Jan Gipe,
who coordinated things on the Sybase side; Ghaf Toorani, who took up the challenge of
technically checking this work; and my coauthors, many of whom suddenly realized the
amount of work that goes into a book.

Most of all, I'd like to thank my family, without whose support I might never have slept.
Thank you all.

A very special thanks to the talented DBAs who helped with this book by writing or
reviewing a chapter or two, but who didn't have time to help with the entire book: Steve
Chowles, Rey-Luh Wang, Malathy Sundaram, Shashikant Bhandari, Rick Kinnaird, Paul
Richardson, and David Straiton.

Jeff Garbus

I'd like to thank my friends and colleagues in the Sybase ASE Engineering Optimizer Group
for all their support while I worked on my chapters for this book. And of course, I'd like to
thank my wife, Laurie, and son, Liam, for the ever-present patience and support they give me
while working on all the various projects I've done over the years.

Eric Miner

I'd like to thank Jeff and Penny Garbus for the opportunity to contribute and for their
continuing support; my brother, Basil, the doctor, whose persistence and determination inspire
me; my parents, Yuling and Yuching, whom I owe more than I can say; the gang at Equant;
and my friends, Carrie, Ray, Shannon, Steve, and Stephanie, whose company and friendship
mean the world to me.

Alvin Chang

I want to thank ITG Israel for being more of a home than a workplace and especially to Tuval
who makes sure it stays this way. I also want to thank ITG's Data Center where I wrote my
first SELECT statement. And last but not least, I want to thank Ricardo Murcia, who was the
most patient mentor I will ever have.

Yuval Malchi

Foreword
Dear Reader,

Thank you for your interest in the latest version of Sybase's flagship product, Adaptive
Server® Enterprise (ASE). I am responsible for the strategic oversight of Sybase's core
database and middleware business, including that of ASE. For more than 15 years, Sybase's
RDBMS has been supporting mission-critical OLTP and DSS applications for thousands of
customers in an array of industries.

The most recent version of Sybase's RDBMS, ASE 12.5, builds upon this solid foundation
with innovations that help meet the emerging needs of businesses as they transition to e-
Business, including efficient tools for handling new e-Business data types (such as XML),
dynamic performance management to address the unpredictable nature of Internet-based
computing, and enhanced security to protect business data in a highly distributed Web-based
environment.

Over the past few years, there have been a number of titles released on the Sybase enterprise
database management system. Administrator's Guide to Sybase ASE 12.5, released late last
year, was widely adopted by consultants, DBAs, and administrators of Sybase® ASE. Sybase
ASE 12.5 High Availability, released in early 2002, is especially appropriate to those whose
businesses demand 24x7 availability of data and applications. The current title, Sybase ASE
12.5 Performance and Tuning, promises to be just as valuable, especially to DBAs whose
primary task is database maintenance and troubleshooting. Jeffrey Garbus is considered
throughout the Sybase community as one of the most experienced database administrators,
consultants, and authors for the ASE enterprise data management software. Garbus covers a
breadth of new ground as the series editor for the Jeffrey Garbus Official Sybase ASE 12.5
Library from Wordware Publishing. I believe that the latest offering within the ASE series
will be an invaluable asset to Sybase database administrators and consultants.

Dr. Raj Nathan


Senior Vice President and General Manager
Enterprise Solutions Division
Sybase, Inc.

Preface
It was pretty much a toss-up.

When offered the opportunity by Wordware Publishing to write a series of books about
Sybase databases, I had to decide what was most topical. What did I want to write about first?
My inclinations were to write on systems administration and performance and tuning and
what had changed the most since the last Sybase books had been published. I made the
(somewhat arbitrary) decision that there are more people performing systems administrator
duties than performance and tuning, so I chose to do an SA book first (Administrator's Guide
to Sybase® ASE 12.5). Since then, I've also written a book about High Availability.

This book is the next step: Your system is up, and 2,000 users are about to hit it. Are you
prepared? Let's talk performance.

Rather than treating this book as a performance and tuning course that you would take at an
educational facility, think of it as a series of essays written by talented, experienced folks,
passed through the hands of several others equally talented, and evolving into the chapters
you see in from of you. Each chapter has passed through the hands of at least five experienced
database administrators.

I hope you get as much benefit out of using this book as we had fun writing it.

Jeff Garbus

Chapter 1: Introduction
Defining Performance

How do you define performance? It is defined differently for everyone, but the following
points include some common considerations:

• Query response time

How long does it take to get the response to my query? Many users have grown
accustomed to subsecond response. Is this always reasonable? What if there are ten
million rows to be analyzed? Is subsecond response time still reasonable? Perhaps. It
will depend upon your hardware and network.

• Throughput

How many transactions can be completed within a defined time period? Is 100
transactions per second (TPS) reasonable or feasible? Some shops hit 100 TPS as a
matter of course (a recent client occasionally hit 400 TPS, as measured by sp_sysmon,
during peak performance on a 12-processor Sun E10000). Is that reasonable for you?
It will depend upon your hardware and how you define a transaction.

• Concurrency

What types of transactions might be performed simultaneously? Can you run your
overnight processing while users are still online?

• Concurrent users

Can 5,000 users run your application simultaneously?

• Ability to run all batch processing in given batch window

You will almost certainly find that if a user is complaining about performance, one of the
above categories will factor in the complaint. The mission of this book is to teach you how to
evaluate the problem and solve it.

Trade-offs

Tuning for performance implies just what tuning means: adjustments and balancing. If your
resources are fixed, then you are involved in a zero-sum game. In other words, giving
resources to one system area will typically take resources from another; if you create a named
cache, you are taking memory from the system for other uses.

Normalization vs. Performance vs. Flexibility

Ask any data modeler; you'll hear that normalization is a goal in and of itself. I have one
client who requires a DBA to submit a three-page justification in order to denormalize a
database.
Normalization is a good start; it allows a rational gestalt of the information in your database.
It is not, however, necessarily the best design for performance reasons. Redundant data, for
example, may save a join, additional I/O, and more lookups.

Retrieval Speed vs. Update Speed

A variety of query types may require a variety of indexes on a table. On the other hand, the
more indexes you have on a table, the more overhead you suffer at update time. You need to
balance retrieval needs against update performance.

Ease of Use vs. High-Volume Transaction Processing

This is a combination of the first two examples. The question becomes, 'How easy can we
make this database for developers and other users to understand versus tricks
(denormalization, for example) that we might use for high-speed transaction processing?'

Storage vs. Cost

Another denormalization issue pertains to redundant data. Are we willing to buy extra storage
in order to add indexes, create summary tables, or separate copies of the data strictly for query
purposes?

The RDBMS

The great myth of relational databases is that you don't specify how to get the data; you
specify what you want, and the server figures out the rest.

Well, this is mostly true.

The server will really return that data, but if you don't have a good physical design, it may
take weeks for the server to retrieve the data. You, the DBA, must provide a proper physical
design to make this efficient.

Expectations

Setting proper expectations is crucial to perceived performance. If a server has to search ten
million rows and does so in three seconds, this might be fast; your user may perceive that as
slow, however, because of expectations of subsecond performance.

create table orders (..., item_num int, warehouse int, ...)


create index ord_idx on
orders (item_num, warehouse)
* * Table is approximately 2 GB **

Query 1
select sum (qty) from orders
where item_num=1234
and warehouse=432

Response time is subsecond. Is performance acceptable? Does it matter?


Regardless, here is a query that will use the index perfectly; this query should zero in directly
on the required row(s) and retrieve the appropriate information. The only exception might be
if a significant number of rows were going to be affected by the query (for example, item
number and warehouse are the only ones in the big table).

Query 2
select sum (qty) from orders
where warehouse=432

Response time is 500 seconds. Is performance acceptable? Probably not. Is it unexpected?

This query will not use the index at all; since this query will table scan, you are unlikely to get
subsecond response time unless the disk is very fast or all of the data is in a very fast cache.

Query 3
select sum (qty) from orders
where item_num in (1234,2345)
and warehouse = 432

Response time is 50 seconds. Is performance acceptable? Probably not. Is it unexpected?


Probably.

This query looks exactly like the first, except that it would be performed twice. My approach
would be to break it up and see if both queries were subsecond; if so, we probably have a
DBMS problem. (Yes, this example was from real life many years ago, though not Adaptive
Server Enterprise.) If response time is very long, perhaps we don't understand the data as well
as we thought.

Defining and Tracking Down Bottlenecks

A bottleneck is the resource that is limiting the performance of your system. Therefore,
elimination of a bottleneck, by definition, has the net effect of shifting the bottleneck. In other
words, there is always a bottleneck. With any luck, though, when you shift a bottleneck, the
new bottleneck is wider.

Typical bottlenecks will be dependent upon your application and hardware. Over the course
of this book, we'll talk about how to identify problems in specific areas.

Performance Variables that Might Cause Bottlenecks

• Architecture
o CPU/SMP management
o I/O
o Network
o Concurrency
• Application
o Query
o Logical design
o Physical design
• Server
o Configuration
o Optimizer
o Lock management
• Concurrency
o Maintenance activity (dbcc, etc.)
o Dump/load
o Index creation
o Batch activity
o Other users
• Query
o Correct query
o Appropriate joins
o Create SARGs (search arguments)
o Updates in place
• Logical design
o Denormalize from third to second normal form
o Vertical/horizontal segmentation of infrequently referenced data
• Physical design
o Too many/few indexes
o Summary data
o Redundant data
o Using segments
• Application issues
o Poorly written SQL
o Repeatedly issues the same SQL
o Cursor use/misuse
o Batch issues
o Division of work between client and server
• Server configuration
o Memory/resource configuration
o Cache, locks, disks, connections
o Network packet size
o Optimizer
• Path selection
o Tactic
o Forcing indexes
• Lock Management
o Placed/released appropriately
o Deadlock avoidance
• Overhead
o Transaction logging
• Concurrency
o Maintenance activity (dbcc, etc.)
o Dump/load
o Index creation
o Batch activity
o Locking
Summary: Tuning Approach

Tuning is a balancing act, and as such, you need to prioritize your needs. Once you've
prioritized, begin your tuning process.

• Evaluate baseline time for your CPU/drive to determine the expected performance.
• Decide whether the response time is appropriate for the best path the optimizer could
take. Are the results expected?
• Examine the problem query. Is it too complex? Does the query resolve the user's
actual need?
• Are appropriate indexes selected for the query?
• Is the optimizer selecting the correct path?
• Break down the query; do individual components also take too long?
• Acquire as much information as possible regarding the circumstances of the query.
• Remember that you can't tune for everything; a physical design will, by nature, be
optimized for specific types of queries.
• Spotlight obvious problems (see all succeeding chapters).
• Consider tuning options that are transparent to users: indexes, segments, etc.
• Estimate the requirements to resolve problems prior to final rollout. Find (or design) a
tool which will simulate user activity, and act on acquired information.

Chapter 2: Physical Database Design Issues


From a purely practical standpoint, you'll find that 95 percent of query tuning is index
selection. In order to do this correctly, you have to understand how the physical storage
structures work (including how space is managed) and how the server chooses indexes.

Physical Storage Structures

Adaptive Server manages space with allocation pages, GAM pages, and OAM pages.
Somewhat out of the scope of this book (see our systems administration book, Administrator's
Guide to Sybase® ASE 12.5), allocation pages track used pages within database fragments,
GAM pages manage allocation pages, and OAM pages are used on a per-object basis to
identify which allocation pages contain allocated extents for an object.

The goal of the data storage structure is to improve performance by reducing enough of the
I/O necessary to retrieve data. Adaptive Server supports two basic storage structures: linked
pages (used for data) and B-tree structures with pointers up, down, and across levels (used for
indexes).

Both of these structure types take different physical forms if you opt to use the data-only
locking (DOL) schemes available since version 11.9.2; evaluating the use and costs of DOL is
also outside the scope of this book. We will assume that tables and indexes are using the
installation default of the allpages locking (APL) scheme.

Note There is a trade-off between CPU and I/O; storage structures are much more important
if the application is I/O bound.
Space is initially defined for use by Adaptive Server with the disk init command. It is
allocated for data with the create database command.

Although space is defined by number of pages (2K by default, but sizes up to 16K are
available in 12.5), Adaptive Server divides this space into allocation units of 256 data pages
each. Each allocation unit is controlled through an allocation page.

Page Utilization Percent

ASE provides for configuration of the page utilization percent to allow Adaptive Server to
allocate new extents to a table or index rather than search the OAM page chain to find
available pages. Page utilization percent is the ratio of used pages to allocated pages; the
default value is 95. When the ratio of used pages to allocated pages exceeds the page
utilization percent, Adaptive Server simply allocates a new extent to the table or index. This
saves a sometimes exhaustive search of OAM pages to find allocated but unused pages. As
databases get large (tens to hundreds of gigabytes), this is a performance benefit, as the server
does not need to walk all of the extents to find free space.

Lowering the page utilization percent can speed up processes performing a large amount of
insert activity on a table, but it will increase the number of pages allocated to the table.

Example:

sp_configure 'page utilization percent', 85

OAM and GAM Pages

In brief, the database is broken into logical units of 256 pages called allocation units. The first
page of each of these units is known as the allocation page and does not contain table or
index data; it only contains information about the contents of the allocation unit (including
which pages are in use). Thus, every 256th page of the database (starting from page 0) is one
of these allocation pages. See the figure on the following page.
The allocation unit is further divided into logical structures known as extents, which are each
eight pages long (with 256 pages/allocation units, this makes 32 extents). Extents are
important because they provide allocation and deallocation of space to a table or index. All
eight pages of an extent are assigned to the same table or index, although they may not all be
in use right now. A special exception applies to the first extent of an allocation unit, where
only seven pages are available for allocation to objects.

While the allocation page tracks the use of space for a given region of the database, the object
allocation map page determines where in a database a given object is assigned space. OAM
pages contain various details about a table or index, but we are most concerned now with the
allocation page entries; if an object is assigned any amount of space in an allocation unit
(from the minimum of one extent to the maximum of 32 extents), the OAM for the object will
contain a pointer to that unit's allocation page. Reading the OAM page for an object will
provide a list of allocation pages, which will each provide pointers to the particular used
pages for the object. These scans can also be used to find pages belonging to an object that are
currently unused.

Lastly, the global allocation map page provides a quick list of allocation units with
unassigned extents. When it's necessary to find an extent available for allocation to a table or
index, one method is to read the GAM page.
Data Pages

The diagram above reflects the default 2K data page. ASE allocates at the extent level and
stores at the page level. Every page has a header, which contains information about the page
(page type, locking-specific details, available space, and other details) and the body, which
contains data. Here we're diagramming a data page. If we are locked with the default (and pre-
11.9.2 only) locking mechanism, allpages, we have a 32-byte header and 2016 usable bytes on
a page. If you are locking DOL, then we have a 46-byte header and 2004 bytes available on
the page.

Note the word 'usable'; for APL tables, we can't actually have a 2016-byte row because when
we log the row on a log page, we require an additional 54 bytes of overhead. This would split
the row across pages, which is not permitted.

Also note that for all data pages, there is a series of 2-byte pointers at the end of the page
known as the row offset table. These entries indicate the location of rows on the page.

Additional information you need to know about pages:

• Rows will not cross page boundaries; note max row and page sizes.
• Each row has at least 4 bytes of additional overhead (itemized later).

Question: What is the minimum storage a 1010-byte row will occupy?

Answer: Since you cannot fit two of these rows onto one page, each row will be stored on its
own 2K page. If you're not careful at physical design time, you can waste storage; this will
give you performance problems because reads will be slower.

Estimating Table Size


When a Table Hasn't Been Loaded Yet
sp_estspace pt_tx_CIamountNCamount, 1000000, 75
go
name type idx_level Pages Kbytes
---------------------- ------------ --------- ----- ------
pt_tx_CIamountNCamount data 0 15874 31749
CIamount clustered 0 105 208
CIamount clustered 1 1 2
NCamount nonclustered 0 10001 20002
NCamount nonclustered 1 97 194
NCamount nonclustered 2 1 2
Total_Mbytes
------------
50.93
name type total_pages time_mins
-------- ------------ ----------- ---------
CIamount clustered 15980 62
NCamount nonclustered 10099 42

The sp_estspace stored procedure exists to allow you to estimate how much space your table
will need for storage. In order to use this procedure to the best effect, you need to first create
the tables and indexes so that the procedure can read the structures and sizes from the system
tables for its calculations.

In the above figure, we are sizing the table pt_tx_CIamountNCamount. We are instructing the
server to size this table for 1,000,000 rows, assuming a fillfactor of 75 percent. The fillfactor
indicates that the pages will be initially set to approximately 75 percent full to allow for some
growth; a smaller fillfactor value will allow more rows to be added before additional pages
need to be assigned but will require many more initial pages to hold the same number of rows.

Estimating Table Size for an Existing Table

For an existing table, use the stored procedure sp_spaceused or dbcc checktable.

dbcc checktable (sample)

The total number of data pages in this table is 244.


Table has 5772 data rows.
DBCC execution completed. If DBCC printed error messages,
contact a user with System Administrator (SA) role.

Table size may also be calculated by hand, but as these calculations are also estimates, using
these tools is far simpler and likely to be as accurate or more accurate.

sp_spaceused sample

name rowtotal reserved data index_size unused


------ -------- -------- ------ ---------- ------
sample 5772 494 KB 488 KB 0 KB 6 KB

Estimating Performance

A table scan is a sequential read of every data page in a table. Table scans are used if there is
no appropriate index or if an index would slow down the process. The time required for a
table scan is directly proportional to the size of the table in pages.

To get exact logical counts, set statistics io on prior to running a query.

Note Knowing the amount of time a table scan will take is useful for index selection, as well
as determining maximum expected query execution time.

Table Scan Time

It is useful to be able to accurately measure the page access speed of your server. Here is one
technique:

1. Create a large table.


2. Cycle the server, so that there are no pages in cache.
3. set statistics io on
4. set statistics time on
select count (*) from table_name

5. Look at the physical page reads reported, and divide by total elapsed time. As of
publication, typical page access speeds are between 200 and 3,000 pages per second.
6. At this point, you can estimate how long a scan would take for a specific table,
although the typical values shown above will give us a very large range of
possibilities.

For a 1,333,334-page table:

1,333,334/200 = 6667 seconds = ~ 111 minutes

or

1,333,334/3000 = 1333 seconds = ~ 7.5 minutes

Table Scan Performance

There are several ways to improve table scan time. Adding an index is typically the first and
easiest choice, in the hopes of avoiding the table scan in the first place. In fact, it may be
reasonably stated that the purpose of an index is to avoid a table scan.

When the table scan cannot be avoided, it frequently helps to spread I/O across multiple disks
and controllers. This may include horizontal or vertical partitioning to reduce the amount of
data to be scanned. Cache can be increased so that more I/O is logical, rather than physical (a
20:1 performance ratio, at publication).

Faster drives (solid state accelerators, for example) or faster disk access methods might help
(raw partitions may be 40 to 50 percent faster than file systems on a UNIX box).

A 16K buffer pool in the appropriate cache may have a dramatic, positive effect on scan
performance.

Note The server will table scan when it calculates that it is cheaper to scan than to access via an
index. The server assumes this threshold is at around 20% of total data (we'll quantify these
calculations later).

Indexes and B-Tree Structure

We've talked all around indexes, so we must finally describe them. Indexes are storage
structures separate from the data pages. Their benefits include:

• Providing faster access to data.


• Helping to avoid table scans.
• Setting boundaries for a table scan.
• Helping to avoid some sorts.
• May be used to enforce uniqueness.

The figure on the following page depicts an index that is clustered on a Last Name column.
Index Structures

All ASE indexes are B-tree structures.

Index Structures Notes

There may be many intermediate levels, depending on key width, row count, and type of
index. Different design books count levels from different directions; it is the number of levels
that is important.

Storage structures are different for DOL locking mechanisms. Although DOL may lock data
at either the page level or row level, depending on the scheme chosen, it will not lock index
entries at all. Instead, a new structure known as a latch (or, more formally, a nested top level
action) will permit consistent access and change of index values. Also note that DOL indexes
don't have sibling links, except on the leaf.

Index Types (APL)

• Clustered indexes - The data is physically sorted in index order, and the data page is
the leaf level of the index. There is a maximum of one clustered index per table
because we can only physically sort the data one way.
• Non-clustered indexes - Index is independent of the physical sort order. There is a
maximum of 249 non-clustered indexes per table.

Tables can (and typically do) have both types of indexes. Either index is limited to a
maximum key width, dependent upon the size of the data pages.

Because you can create tables with columns wider than the limit for the index key, you will be
unable to index these columns. For example, if you perform the following on a 2K page
server and try to create an index on col3, the command fails and Adaptive Server issues an
error message because column col3 is larger than the index row-size limit (600 bytes).

create table tab1 (


col1 int
col2 int
col3 char(700))
'Unable to index' does not mean that you cannot use these columns in search clauses. Even
though a column is non-indexable (as in col3 above), you can still create statistics for it. Also,
if you include the column in a where clause, it will be evaluated during optimization.

Statistics Maintained on the First 255 Bytes of Data

Optimizer statistics are various measures of table size (page and row counts, row size, etc.)
and data distribution used to determine the most efficient access method for a query (index,
table scan, etc.). Different statistics are measured and maintained by different means, some by
the server automatically and some by the action of an owner or administrator. Regardless of
how you gather statistics, they are maintained only for the first 255 bytes of data in a column.
If you use wide columns, any information after the first 255 bytes is considered statistically
insignificant by the optimizer.

Page Size User-visible Index Row-size Limit Internal Index Row-size


Limit
2K (2048 bytes) 600 650
4K (4096 bytes) 1250 1310
8K (8192 bytes) 2600 2670
16K (16384 bytes) 5300 5390

Index Types (DOL)

The data is still logically sorted in index order, but it may not be sorted on the page (this
means that clustered indexes aren't, so we will often refer to them as 'placement indexes'
instead). The storage structures are identical for both clustered and non-clustered indexes, and
there are no longer sibling pointers at non-leaf index levels. Index pages now have row offset
tables to determine the order of keys, since they are not necessarily stored in order on the
index page. Ordinarily, a clustered index is identified by Index ID 1; for DOL tables, this
value is no longer used. The one DOL CI is identified with 0x200 (decimal 512) in the status2
column of the sysindexes table.

DOL indexes also have some structural changes designed to improve their storage efficiency:
Duplicate keys in non-unique indexes are stored only once per index page, and key values on
non-leaf pages are reduced to the minimum possible byte length by using suffix compression.

Clustered Indexes

Clustered indexes are used to direct placement of data rows in sorted order. Rows may not be
sorted on the page. The data pages are not leaf pages.

Advantages:

• Data modifications are faster.


• Less movement on page.
• DOL index B-trees are more efficient.

Disadvantages:
• Retrieval may be slower (more I/O to get a row).
• Large I/O may not be as effective.
• May need 40 to 50 percent more storage.

For APL clustered indexes, the leaf level is the data level, and the prior index level needs only
one entry per page of data.

Non-Clustered Indexes

The non-clustered index requires an extra level, as it needs pointers for every row of data.

Index levels point to succeeding index levels. The B-tree is traversed by following these
pointers.
This figure shows a non-clustered index on a First Name column.

Descending Indexes
Create index i1
on sales (id asc, purchasedate desc)
This will create an index on the sales table. If you look frequently for the most recent
purchase, you will avoid scanning through older purchases. This can help avoid backward
scans, which are a frequent cause of deadlocks.

New Rows

APL tables with a clustered index will place rows in the table according to the clustered index
order. APL heap tables will add rows in the last position of the last page of the table.

DOL tables without a placement index have new rows added to a designated insert page.
Unlike APL tables, we cannot say that this page is at the end of the table, since DOL tables do
not maintain page pointers or a page order. The last logical page number (on APL) or insert
page number (on DOL) is stored in sysindexes as the root page.

Row Removal

When the last row is removed from an APL data or leaf page, that page is removed from its
current page chain and marked as 'available' on the allocation page. When rows are removed
from the middle of a page, the remaining rows are adjusted upward on the page so that free
space is left at the end of the page. Removing rows causes deletions to cascade into all non-
clustered indexes that point to rows on that page.

Rows that are deleted from DOL tables are marked as logically deleted, but they are not
physically removed from the page immediately (some of this space will be recovered with
time and some when maintenance commands, like reorg, are used). Tables with a placement
index have rows added on the appropriate page, if there is room; otherwise, the row will be
placed on the nearest available page. DOL leaf-level index pages will split, since the entries
must remain in index order.

Implications of Modification

Data modification statements can have a variety of adverse impacts on performance in regard
to indexes. Updates and deletes will frequently cause rows to move, both in the data and in the
index. If an update or delete causes an APL row to move, all non-clustered indexes on that
table will need to be updated to reflect the new row position. By contrast, a DOL row that
moves will not update its index entries, but it will leave behind a forwarding address to the
new physical location. Likewise, inserts into an APL data page of a table with a clustered
index will frequently cause rows on the page to be moved down, or they will cause the page to
split. All non-clustered indexes on the table will need to be updated to reflect the new row
position(s).

Setting Fill Factor

The default fill factor is set at the server level, but it can be overridden when indexes are
created.

Server-wide default:

sp_configure 'fill factor', N


Overriding at index creation:

create index idx_name on table (column)


with fillfactor=N

In both cases, N is the percentage full, except:

N = Data and index pages full, except root.


100
N = 0 Full data pages, indexes at 75 percent.

Choosing Indexes

Sometimes, the clustered indexes are created based on the primary key of the tables. In a way,
this makes sense. A clustered index tends to be one I/O faster than a non-clustered index for a
single-row lookup. So many database design tools automatically make this selection: 'If this is
the primary way we go after the data, we'll place the clustered index on this one and save the
I/O whenever we retrieve the data.'

However, there are other reasons to choose: Remember, clustered indexes have the data
physically sorted in index order, so that makes it a great choice for any range searches or
queries with an order by clause.

Non-clustered indexes are a bit slower and take up much more disk space, but they are the
next best alternative to a table scan. For some queries, known as covered queries, non-
clustered indexes can be faster than clustered indexes.

When creating a clustered index, you need free space in your database equal to approximately
120 percent of the total table size.

Summary

Ninety-five percent of tuning is based upon index selection. Choose your indexes wisely,
based upon your understanding of physical storage.

Chapter 3: I/O - The Complete Works


Introduction to I/O

As each year passes, the amount of data available online is increasing. This is partly due to the
advances in disk technology with vendors providing larger disks and competitive prices, as
well as disk technology improvement. The computers themselves are improving also, causing
an imbalance in performance between the computers and the disk storage.

With more data available to query, the database administrator now has to understand and tune
the I/O system in order to provide adequate response times for retrieving and writing data.
This chapter is not only aimed at database administrators that have systems storing terabytes
of information, but to those that have systems with a few hundred megabytes as well, as these
can also be problematic. Its goal is to provide you with enough information so you can
investigate and understand the systems you look after and to enable you to understand where
the I/O bottlenecks are.

A common question I hear regularly is, 'sp_sysmon shows a device performing 100 I/O per
second, is this a problem?' I also hear DBAs say 'sp_sysmon shows cache misses at 1%, and
1% is small, so this must be OK.'

A device with 100 I/O per second is only a problem if 100 I/O per second is the maximum
throughput of your I/O system. If the maximum is 1000, then your I/O subsystem is only 10%
utilized.

Cache misses at 1% is a misleading figure. The figure you should be concerned with is the I/O
per second for cache misses. If 1% equates to 1000 I/O per second, then this could indicate
that your I/O subsystem is saturated.

You will need to understand your application and the hardware it is running on. You will need
to be aware of the operating system and hardware limits. This will enable you to make
educated guesses when it comes to configuring and tuning your ASE servers and its
associated hardware.

I will not be biased toward any operating system or hardware vendor. I will, however, quote
performance and throughput figures to provide a better understanding of the subject. I will
stress that technology is changing and the reader will need to investigate his or her system to
get an idea of its limitations.

Before we start talking about configuring disks, setting operating system parameters, or
configuring ASE, we need to have a basic knowledge of the hardware. Since this chapter is
devoted to I/O, the emphasis is on disks.

Once you grasp I/O, never again will you look at someone blankly when he asks you, 'How
long would you expect a dbcc or update statistics to take?'

Let's begin by talking about disks.

Anatomy of a Hard Disk Drive

Hard disk drives, or disks, are made up of multiple platters and controlled by a disk controller.
The old style disks did not store very much data - about a megabyte in the beginning. When a
read or write request arrived, the controller would carry out the action and return the results. It
would then wait for the next request to arrive.

Today's disks are more intelligent than their predecessors. The drives themselves now contain
CPUs and memory.
Disk CPUs

Other than transferring data to and from a bus, the CPU can order requests to reduce head
movement and amalgamate requests to read and write bigger chunks of data. Since the latency
and seek times will be reduced, so will the overall throughput. Depending on the amount of
requests arriving, this chopping and changing can lead to varying levels of performance.

There are different standards to control disks. I will be discussing SCSI drives only, as IDE
and EIDE are not up to the job. There are others, but these are the more well-known ones.

Disk Memory

The disk memory, or cache, is part of the disk drive. It is used to store read and write requests,
as well as the data that has been read or the data that needs to be written. It can vary in size,
but 128 KB is typical. To increase throughput, the controller can prefetch data using an
algorithm in the drive. It looks for access patterns and orders I/O accordingly.

This can improve sequential access in lightly loaded systems, but it may decrease
performance when there are multiple sequential accesses competing with each other.

With SCSI drives, the larger the cache, the more requests it can queue. Generally, around 64
commands can be queued with 512 KB caches before the sender has to wait. The SCSI-2 and
SCSI-3 protocols allow up to 256 commands to be queued to each drive. It actually states, 'to
each Logical Unit Number (LUN),' but drive is good enough.

I will discuss the SCSI protocol later in this chapter.

Host Bus Adapter (HBA)

The machine your application runs on is called the host. In order for data to be transferred
between your host's memory and the disk, you need to connect your host's data bus to the
disk's bus, commonly called the SCSI bus.

A host bus adapter's job is to act as a middleman between your host and disk. It will convert
requests for data by sending electronic signals to your disk. It will also queue requests when
the number of outstanding requests reaches the limit.

Access Theory

SCSI accesses a disk by block number, and there is no way to tell which sector, cylinder, or
head will be used. The reason for this is that the outer tracks are larger than the inner tracks
and can hold more data. Although UNIX still refers to disks using the sector, cylinder, and
head geometry, the SCSI device driver, or Logical Volume Manager, if installed, converts
each request to the required SCSI block number.

A SCSI device driver is a piece of software written by the manufacturer that enables the
operating system to access the drive without knowing its intricacies. This way, the operating
system can communicate with drives from different manufacturers.
Find Out What Disks You are Using

It is important that you understand what type of disks you are using to store data. It doesn't
matter if these are locally attached to your host or in their own disk array.

The disks will be the slowest part of the system, and it is only when we perform reads across
multiple disks simultaneously that we can get the throughput.

All disks are different, so check the manufacturer's specifications for the following:

• SCSI Type - The SCSI standard has been around for a while. As each new version is
released, the ability to handle greater speeds increases. Knowing which standard your
drives conform to will enable you to understand expected transfer times.
• Seek - This is the time it takes to move the heads to the correct tracks. This is typically
around 5 to 40 milliseconds.
• Rotational Latency - This is the time it takes for the data to appear under the head
when the head seeks to the correct track. Disks spin around 10,000 revolutions per
minute (RPMs) and take around 4 to 9 milliseconds to locate the data. These times
will drop when the new 15000 RPMs are used. Basically, the faster it spins, the
quicker the access times.
• Internal Transfer Rate - This is the rate at which the data can be moved from the
media surface to the controller's cache. The figures are quoted in megabits per second.
Rates range between 10 to 100 Mbps (megabits per second).
• External Transfer Rate - This is the rate at which data can be sent from the
controller's cache to the bus. Typical rates are 5 Mbps to 20 Mbps for 10,000 RPM
disks.

All these are subject to change as the newer generations of disks are developed.

It is time to move on and talk about disk arrays and how they are used in Storage Area
Networks, but before I do, I want to bring the following tip to your attention.

UNIX iostat Command Tip

When iostat was written, disks were fairly unintelligent. They were only able to handle a
single read or write request. When the request was received, the host bus adapter (HBA)
would wait for either the data to be returned for a read or an acknowledgement for a write.
Once this was received, it sent the next request. The HBA itself is able to queue around 64 to
256 requests, depending on the amount of memory in its cache.

Since the HBA could only send one request to a disk at a time, the following could easily be
reported:

Service Time = Milliseconds for the request to be completed

Transfers per Second = Number of requests the drive handled

Busy = The time the HBA had requests to send

Queue Length = The number of request in the HBA


Now that a drive can handle more than one request simultaneously, it leads iostat to report
inaccurate figures. iostat reports Wait Time as the time spent in the queue at the HBA; in fact,
the requests are queued by the disk array or on the disk itself. Due to the Wait Time being
wrong, the Service Time is also indicated wrong. Service Time includes the Wait Time, but
we know that the Wait Time UNIX reports are zero.

What you need is a tool that reports Wait and Service Time for requests in the Wait Queue at
the HBA, and what is referred to as the Active Queue, to be reported for queuing and
servicing in the disk array or on the physical disk.

If you use any UNIX tools to monitor I/O performance, find out if they are written with
modern disk arrays in mind.

Disk Arrays

Disk arrays are separate storage cabinets housing a bunch of disks. In fact, disk arrays were
also called JBODs, or 'Just a Bunch of Disks.' Connections to hosts are commonly SCSI or
fibre channel cabling.

The idea behind disk arrays is to improve read and write I/O throughput by eliminating the
biggest bottleneck, the single disk drive. Disk arrays have the ability to distribute requests, as
well as additional functionality to reduce the load on the hosts attached to them. This allows
the hosts to continue doing other processing.

There are a wide range of manufacturers providing disk arrays, but rather than talk about one
specific vendor, I will list common features incorporated in them and why they are used. All
the features listed may not be in your disk array, so you will need to read the manufacturer's
specifications.

Also, there are functions provided by disk arrays that I have not mentioned here, as I am only
interested in functions that affect performance.

Memory Caches

A cache is used to speed up writes and to hold data from read requests. The cache uses the
same technology as computer memory, so it accesses in nanoseconds rather than milliseconds.

When a host issues a read, the disk array fetches the data from disk and stores a copy in its
cache. If the host wants to read the same data again, it can be sent directly from the cache
rather than reading it from the slower disk.

When a write is issued, the host would normally get a write acknowledgement when the write
is on the physical disk. Using a cache, the write is acknowledged once it is in the cache. In the
event that the disk array is switched off, the manufacturer has incorporated a battery backup
system that de-stages all the writes from cache to disk.

The cache size varies, but anything from 256 MB to 16 GB is used. This size will be smaller
than the total disk space provided, so there will be times when the cache is full and the
requests will be at the slower disk speeds. This is important to realize, especially when
running any sort of benchmark.
There are many ways to distribute the cache for reads and writes, and it may not be a free-for-
all. Some vendors will allocate, say, 20% for writes and 80% for reads. Also, requests to a
disk may not be able to use the entire cache for itself; it may distribute the cache evenly
depending on the number of disks. It is worth finding out how your disk array's cache is split
up to give you an idea of the expected performance.

When data is written to the cache, it has to be flushed to disk at some point. Software built
into the array will decide when the flushing will occur. If the cache fills up, then dirty writes
will be written to disk. Some arrays can perform different types of copying, and the data gets
flushed on a priority basis.

Alternate Paths

Disk arrays are built with failure in mind. The manufacturer will normally double everything
in case of a hardware failure so the array can continue to function. By incorporating this
feature, they also increase the performance.

All disks have their own controllers. In some arrays, the manufacturer will divide the disks
evenly and add another controller to control the group of disks. As a fail-safe mechanism, an
alternate path is added to these disks in case one fails. This means that between the cache and
disks is a controller and an alternate controller to manage requests.

Algorithms inside the array will load balance requests across the primary controller and the
alternate controller. This enables two requests to be sent between the cache and disks
simultaneously.

Host to Array Adapter

When a host connects to a disk array, it can use multiple connections to enable the host to
send parallel requests. The connections for some time were SCSI, but for greater throughput,
the connection tends to be fibre channel.

Prefetch Algorithm

It would appear that everything is prefetching these days, and disk arrays are no exception.
The firmware inside a disk array will monitor the requests to and from the disks, and if it
detects an access pattern, the array will generate I/O requests of its own.

Hardware Mirroring

Mirroring is not new. It is when two copies of the same data are maintained across two disks.
The idea is that when a write occurs, an extra write is performed to a mirror copy. If the
original copy is corrupted, the data can be retrieved from this mirrored copy.

ASE is able to maintain mirrors, as does a Logical Volume Manager. The problem with using
mirrors is that an additional I/O has to be sent from the host to the disk array, doubling the
writes. Also, an additional I/O request will consume CPU resources and will have some
impact.
Disk arrays support internal mirroring to reduce the load on the host. The array will also take
advantage of having two copies of the same data by distributing read requests between them.

RAID 5 Optimization

One advantage of mirroring is the added safety of knowing your data can be retrieved in the
event of a disk failure. Another advantage is the performance benefit of reading from both
sets of disks. Unfortunately, the downside is costly. If you require 20 disks for your data, you
will need an additional 20 for the mirroring.

RAID 5 allows you to retrieve data in the event of a disk failure without requiring twice the
number of disks. The RAID standard is described later in the chapter, so I will only mention
the improvements of RAID 5 in the disk array.

Previously, the Logical Volume Managers handled RAID 5, but performance was always an
issue. When a write is performed on a RAID 5 volume, additional reads and writes need to be
performed to keep the data in sync in the event of a disk failure. This would require the host
to issue multiple I/O requests and place a load on the machine. The performance hit was
always too great, and so RAID 5 was never used in an I/O-intensive environment.

Disk arrays now come with a RAID 5 optimization built into the hardware. They are actually
Exclusive OR (XOR) engines which prefetch data and calculate parity information much
more quickly. These are a great improvement from the software versions but still lag behind
in performance compared to mirroring and striping.

Striping

I cannot conclude my discussion of disk arrays without mentioning striping. There is a


misconception that all disk arrays understand striping. It is the Volume Manager that controls
striping by directing writes to particular disks. All the disk array sees is a request to write a
block of data to disk. However, new disk arrays are becoming available with striping built in
so check with your vendor first.

Striping for performance is explained later in this chapter.

Storage Area Networks

Disk arrays are great for storing data, but only one host can see the data stored on it. File
servers have been around for a while to provide storage for regular files. This type of storage
is termed Network Access Storage.

Network Access Storage, or NAS, provides access to disk storage using TCP/IP or UDP/IP
over Ethernet or Fibre Channel. A user requests a file by sending a request to the file server,
which would send back the data. Could NAS provide access to a database at any reasonable
speed? The answer is no.

The Internet Protocol (IP) and Ethernet or Fibre Channel are what make up a local area
network (LAN). The IP protocol was originally used for sending and receiving small bursts of
network traffic, which it is very good at. When sending large amounts of data, there is a large
overhead processing all the header information that goes with it. Basically, IP is not up to the
job.

Did you know that it takes around 1000 million processor instructions per second to send
TCP/IP data over Ethernet at 1 GB per second, but it only takes 34 million per second to send
it over Fibre Channel?

The term Storage Area Network, or SAN, is used to describe a network built specifically for
the transmission of data and a storage infrastructure for any number of interconnections
between hosts and disk arrays.

At the physical level, a Storage Area Network connected to a host over Fibre Channel is not
very different from a local area network using Fibre Channel. Each uses the same signaling
and encoding methods, but what makes a Storage Area Network better than a local area
network is that it is focused at optimizing the sending and receiving of data between the host
and disk array. With processor and memory speeds increasing, the I/O side of things is rather
slow to catch up. Therefore, it has to manage the access of data to provide continuous
reliability to whoever needs it.

A SAN is able to sustain about 80% throughput over the network, whereas 30% sustained on
a LAN is doing well.

The protocols used in a SAN are the SCSI-3 Protocol and Fibre Channel Protocol (FCP).

The SCSI Protocol

There are various standards for accessing disk drives, including IDE, EIDE, and SCSI, but it
is SCSI that the disk array vendors choose. Fibre Channel disks are SCSI based in that they
use the SCSI protocol.

While there are numerous reasons SCSI is used in preference to the others, the main reason
vendors choose SCSI is its ability to disconnect from the bus while allowing other disk
requests to be processed. The bus is the connection between the host bus adapter (HBA)
attached to the host and the disk drive. In the case of a disk array, it is the disk array's adapter
that carries the data.

When a read request is received from the host, the command is sent to the disk controller to
fetch the required data. The seek and latency times of the disk are the slowest part in the path
of an I/O, and by switching off the bus, further read and write requests can be issued, giving
better throughput. During this time, the disk controller may receive many requests from other
host bus adapters, so it may start ordering them to improve performance. Once the disk
controller has the requested data, it connects to the bus and sends the data to the requesting
host.

A write request is similar to a read request. Instead of the disk controller sending back data, it
will send back a write acknowledgement when the disk controller reconnects to the bus.
Introduction to SCSI

SCSI is an acronym for Small Computer System Interface. This protocol was written before
disk arrays, and rather than develop new protocols for today's technology, SCSI was enhanced
to provide the power and performance that was needed. Do not be put off by the term 'Small
Computer' in the SCSI acronym; SCSI can control a whole range of different storage
hardware, but we are only concerned with disks.

Devices are attached in parallel on a SCSI bus. Depending on the SCSI type used, only a
certain number of devices can be attached. Each attached device has a unique Logical Unit
Number, or LUN, starting from zero.

The SCSI standard was created so different vendors could develop drives that could be put
into any machine that had a SCSI interface.

SCSI is a command set that hides the mechanics of the drives from the operating system. The
disk drive vendor writes a device driver that sits between the operating system and disk drive.
The device driver's task is to interpret SCSI commands and convert them into something the
disk drive understands.

The SCSI standard has three versions: SCSI-1, SCSI-2, and SCSI-3.

• SCSI-1 was the first SCSI standard. It supported few communication devices with a
maximum of seven devices on the bus, only allowed one request to be outstanding for
each LUN at any time, and had a peak throughput of 5 MB/sec. For the sake of clarity,
MB/sec is megabytes per second.
• SCSI-2 was developed next to increase throughput speeds, provide better error
correction, and enable support for extra devices. You could have up to a maximum of
16 devices on a single bus, and each could have up to 256 requests outstanding.
Maximum speed was 20 MB/sec.
• SCSI-3 is the newest standard in use today. In fact, it contains a whole family of
command sets for different communication devices, some not even approved by the
standards committee at the time of publication. SCSI-3 is a breakthrough in terms of
throughput, and support for large numbers of devices on a single bus, among other
things, was added. At present, SCSI-3 using the Fibre Channel Protocol (FCP) can run
at speeds of 100 MB/sec with a promise for increased speeds in the future.

SCSI was first developed as a parallel interface, but to accommodate today's new breed of
high-speed devices, SCSI-3 provided a serial interface. Now, you would think parallel is
faster than serial, and in many cases, it is, but coupled with the Fibre Channel Protocol, a
serial bus becomes just as fast. As well as using fewer wires to make it more robust for disk
arrays, the protocol has added the capability of supporting up to 16 million devices down the
same cable. This enables disk arrays to contain almost an unlimited number of disk drives.

Changing to serial required device driver writers to rethink the way they coded the drivers due
to scanning the buses for devices. With potentially 16 million attached, the old style took
roughly a quarter of a second for each device.
Initiators and Targets

Initiators are the host bus adapters that link the host's bus to the bus containing either a single
disk, disk array, or Storage Area Network. It is here that communication is initiated.

Targets are the controllers on a disk drive or an interface that connect the network or host to a
disk array.

SCSI IDs and Logical Unit Numbers

If you have a system with only one initiator and one target, you have a pretty small system.
Generally, you will have one initiator and multiple targets, so how do you distinguish them?

The answer is to give each device and the host bus adapter a unique SCSI ID. Imagine it like
house numbers on a street.

I mentioned that LUNs were devices, although I stated above that each device was assigned a
SCSI ID. Well, LUNs are devices within devices. The way this works in practice is that every
device ID, from 0 to 7, can have up to eight LUNs, also labeled 0 to 7, giving eight devices
per SCSI ID (SCSI-3 can have up to 64, labeled 0 to 63).

Speeds

Regardless of whether disk drives are attached directly to a host or reside in a disk array, the
drive itself will be of a certain SCSI specification. It is up to the reader to understand which
drives they have in order to determine expected throughput. To put it another way, if you had
a disk drive connected by Fibre Channel, but the disk drive itself was an ultra-fast SCSI disk,
the maximum throughput you could expect is 20 MB/sec, even though Fibre Channel can
handle 100 MB/sec.

The following lists the speeds for each SCSI specification:

SCSI-1 5 MB/sec
Wide SCSI 10 MB/sec
Fast SCSI 10 MB/sec
Fast-wide SCSI 20 MB/sec
Ultra SCSI 20 MB/sec
Ultra-wide SCSI 40 MB/sec
Ultra-2 SCSI 40 MB/sec
Ultra-2 Wide SCSI 80 MB/sec
Ultra-160 SCSI 160 MB/sec

In the future, you will see more advancement in SCSI speeds along with different names
representing each progression. The standard is always being enhanced and up for approval.
However, manufacturers rarely wait for standards to be approved, which is why
documentation still often references SCSI-2, even though most products shipping today are
the SCSI-3 standards.
Let's look at how Fibre Channel sits on top of the SCSI-3 Protocol to enable Storage Area
Networks to deliver the performance.

The Fibre Channel Protocol

Fibre channel is a high-end universal pipe. It can connect anything at speeds up to 100
MB/sec or 1 GB/sec. It is able to travel over fiber optic and coaxial cable, coaxial being the
cheaper option.

Fibre Channel is one in a family of protocols that works with the SCSI-3 standard to deliver
100 MB/sec performance over a single fiber optic or copper wire cable. Each host requires a
host bus adapter that connects the host's bus to the network cable. The HBA receives a SCSI
command along with any data, and its device driver converts it to something that can be sent
over fibre or coaxial at 100 MB/sec. The receiving end converts the command and data to
signals on a SCSI bus that the target understands. The cycle then repeats.

Let's take a closer look at the bit in the middle.

Introduction to Fibre Channel

Fibre Channel is a list of rules for sending data from A to B. If that is all that is required, it
will do this comfortably with little overhead.

Fibre Channel can be used to your advantage if you require multiple hosts to be able to send
and receive data to multiple disk arrays. It is this flexibility that has enabled disk array
vendors to copy data between disk arrays without the data going anywhere near the
commanding host.

Fibre Channel Protocol sits below SCSI, IP, and a whole load of other protocols. It is made up
of five levels of hierarchy, each performing a function before passing it to the next level. Only
when the last level receives it does the data get put on the wire.

Fibre Channel Hierarchy

These are the five levels of hierarchy, FC-4 through FC-0:

• FC-4 - This level is the interface to the protocol above it. Since we are only interested
in disks, this will be the SCSI-3 protocol. It maps SCSI functions to the constructs
used by the FCP.
• FC-3 - This level is not well-defined. The FCP standard is still being developed, and it
is a level that may be enhanced or removed altogether.
• FC-2 - This level is the most complex part of the FCP. This level adds different
classes or services, breaking down packets and adding sequencing for reassembly,
error detection, coordination between various hosts, flow control, and more
importantly, physical address assignment.
• FC-1 - This level will provide bit, byte, and word alignment to enable the receiving
end to detect transmission errors easily.
• FC-0 - This is the physical interface to the cable. It is here that the electrical signals
are sent.
If this is already more than you need to know, that's OK. The following section is the
interesting bit.

Types of Fibre Channel Networks

There are three ways to implement a Fibre Channel Network for data storage:

• A point-to-point between a host and an array.


• A network of multiple hosts and arrays called a Switched Fabric.
• A ring topology called an Arbitrated Loop allowing multiple hosts and arrays to pass
data in a loop without the switched elements.

Point-to-Point

There's not much to say here. The host sends a request to an array, and the array sends back
the results.

It does guarantee maximum bandwidth, availability, and freedom from collisions and
congestion, but it is an inefficient use of hardware since few applications can drive 100
MB/sec.

Switched Fabric

This topology consists of two or more hosts and/or two or more arrays, now termed ports.
Each port is connected to a switch. A switch may connect to another switch. The switches
receive and transmit data to and from the ports, based on the addressing information added by
the FC-2 stage. It may route directly to the destination port or through another switch. You
can have up to 16 million ports.

Fabrics make better use of the bandwidth, but the complex switches are quite expensive.

Arbitrated Loop

Multiple ports can be connected together in a loop configuration. The ports connected have to
check the loop when sending data to ensure no other port is using it. Each port has to check
every piece of data sent to see if it is for them. You can have up to 126 ports on one loop.

Waiting for the wire to be available and checking every bit of data causes it to be not as
efficient as a fabric topology, but it is cheaper to implement. This provides a compromise
between point-to-point and Switched Fabric.

When an Arbitrated Loop is used, you will see the reference FC-AL.

Mixed Topology

There is also no reason why you cannot mix fabric with Arbitrated Loop. It is about price
versus performance.
Host Connections

As we know, a host bus adapter can send the data at 100 MB/sec on Fibre Channel, so would
we ever need more than one? The answer is yes.

We may only send a maximum of 50 MB/sec, but performance is all about bottlenecks. When
you remove one, you start looking for the next one. Fibre Channel might not be a bottleneck,
but moving requests from the CPU to the host bus adapter might. I say might, because your
processors could be so slow that the HBA sits idle for most of the time.

The other reason for having two is for failover. Disk array vendors go to great lengths to
ensure the arrays have no single point of failure, so nor should you.

I would say that the benefits of having two HBAs are 10% for failover and 90% for
performance.

Logical Volume Manager

When UNIX was first developed, disk drives were small and I/O throughput was not
important. Utilities were developed to split these disks into a number of slices. Each slice was
created as either a file system to store files or raw volumes to store raw binary data.

Soon the requirement for larger disks increased, hitting limits that the current versions of
UNIX couldn't cope with. The limit governing the number of slices per disk was also
extending beyond UNIX capabilities. To satisfy the increase in disk storage, Logical Volume
Managers (LVM) were developed to ease the managing and configuring disks.

I am not saying that you cannot attach an array to a host without an LVM. Without it, the
kernel would see each logical disk created in the array as any other target and LUN.

Logical volumes are physical disks split into equal sizes and represented as if they were
individual disks. For example, one 47 GB disk would appear to UNIX as five 9 GB disks.

Before you think 5 multiplied by 9 equals 45, the majority of the computing world treats 1K
as 1024 bytes. Unfortunately, disk drive vendors treat 1K as 1000 bytes. This is a point to
note when buying disk drives.

While the UNIX vendors were developing their own LVMs, third-party Volume Managers
were appearing, each providing gains in performance and functionality.

Advantages of a Logical Volume Manager

• LVMs simplify the adding and removal of volumes as well as reconfiguring their sizes
without having to copy out the data and loading it back in.
• They provide failover when multiple host bus adapters are used.
• LVMs also support RAID standards. RAID is discussed later in this chapter.
• Disks and raw partitions can be given meaningful names.
• LVMs provide I/O statistics.
Disadvantages of a Logical Volume Manager

An LVM does not load balance across host bus adapters. Most disk array vendors provide
software to enhance I/O functionality that is not supported by the operating system.

If the host is part of a Storage Area Network with multiple hosts and arrays, the volumes that
each host sees are the same for everyone. Volume Manager will freely write data to a volume
that is used by another host. Again, disk array vendors provide software to prevent hosts from
overwriting each other's data.

If any RAID options are used, it adds CPU and I/O overhead to the host. Striping is the only
option worth implementing. In fact, now is a good time to discuss RAID standards.

RAID Standards

RAID stands for 'redundant array of inexpensive disks.' The term initially described the disk
arrays, but soon various methods of storing data began to appear. There was also a
requirement for resilience in the event of a disk failing. The storage methods and resilience
brought about different RAID levels.

RAID Levels

RAID levels are a standard for improving either the throughput speed or the resilience of the
disk array. It works for both raw partitions and file systems, but to simplify the explanation, I
will only refer to raw partitions.

Below are the RAID levels that are mostly used for ASE today.

RAID 0 Striping A raw partition is split evenly over two or more


logical volumes. The idea is that while one disk is
servicing an I/O, the next disk in the stripe set can
be processing the next I/O. This has the benefit of
enabling simultaneous I/O.
RAID 1 Mirroring Two copies of the same logical volume are kept on
different physical disks. In the event of a disk
failure, the data is still available on the good disk,
preventing downtime and data loss.
RAID 0-1 Striped - Mirrors This takes the best of both worlds, incorporating
the speed of striping with the resilience of
mirroring.
RAID 5 Striping with parity From a cost point of view, mirroring doubles the
number of disks your application requires. RAID 5
has the read performance of striping but has the
resilience of mirroring by writing parity
information for every write across all logical
volumes in the RAID 5 set. This means that if a
disk is lost, a new disk can be constructed from the
parity information.
Until recently, disk arrays knew nothing of the RAID levels specified above. The interface
simply received read and write commands for data on, say, this disk, this sector, or this block.

Logical Volume Manager Support for RAID Levels

RAID 0

With striping, the LVM determines where to issue the write I/O based on the definition of the
striped raw partition. Each striped partition has a stripe size. This is the amount of data that is
written to a logical volume before the LVM switches to the next logical volume.

Given that the seek and latency are a big percentage of the overall time to perform an I/O,
once the disk heads are ready to write, we ought to write as much data as we can before the
heads move again. The obvious unit would be the size of the track, since to write a track
doesn't require the heads to move.

It would also make sense to write to an entire cylinder, since this would require a small
amount of seek time and no latency. This is true, but there is a catch. A physical disk has a
small amount of on-board cache where I/O requests are stored. If this is larger than a track
size but smaller than a cylinder, configuring the stripe size to the size of the cylinder will
cause I/O requests to be queued. The idea is to find a 'happy medium' between the two.
Configure a stripe size, which is a multiple of the track size. Remember that the cache stores
the I/O requests as well as the data, so allow for this. Most SCSI disks can queue 64
commands or more.

The outcome will reduce head movement while preventing I/O queuing, since the disk
controller's cache is full.

RAID 1

With mirroring, when the LVM device driver receives a write request, the LVM will issue
another write request for the mirrored volume. This means that for every write I/O issued
from the application, an additional I/O is generated, doubling the traffic.

It is a trade-off between maximum throughput and maximum resilience.

RAID 0-1

The LVM incorporates both RAID 0 and RAID 1 to provide increased throughput and
resilience.

RAID 5

Simply put, when a write request is received, LVM performs an Exclusive OR (XOR) logical
operation on the data and writes the parity information to one of the other logical volumes in
the RAID 5 set. To increase performance, every logical volume is used to store this parity
information so that the parity disk does not become the bottleneck.
This is an improvement on RAID 4, which wrote all the parity information to the same disk
(hence why it is not typically supported by Logical Volume Managers).

The biggest drawback of RAID 5 is that for every write I/O, the LVM has to read every block
of data that was XORed with the current write operation and recalculate the parity. It then
sends the write request to disk for the original I/O and for the new parity information. So for
every write I/O request, the LVM has to generate extra reads and writes.

Moving Functionality from the LVM into the Disk Array

As time went on, disk array vendors started adding more functionality to the disk arrays to
keep the I/O from going back and forth between the array and the host so often.

The biggest breakthroughs in performance were a built-in cache to store data and a battery
backup facility to prevent data loss. The cache would hold data, so in the event that the same
data is needed, it can be retrieved from the cache at memory speeds rather than from the disk.
The cache is only a certain size, so not all the data can be cached. Writes can be written to
cache also, with an acknowledgement being sent back to the host. The data can then be de-
staged to disk at a later time.

Mirroring, which appears transparent to the operating system and LVM, was incorporated into
the disk arrays. This negates the need for the LVM to send an additional I/O write request.

The firmware inside disk arrays has some intelligent data-fetching algorithms and can
improve read performance for mirrored logical volumes. It can load balance the reads across
the primary physical disk and the mirrored physical disk.

Some disk arrays are appearing with RAID 5 controllers built in. This enables a group of
logical volumes in the RAID 5 set to appear to the LVM as a single logical volume. This
removes the need for the parity information to go back and forth between the LVM and disk
array. A single write request will generate one write from the LVM.

I/O from Beginning to End

So far, I have written a great deal about the hardware and how it fits together. I have
explained various standards and technologies, and hopefully you have a clearer picture in
your head of all the pieces to allow you to make better judgments when it comes to ironing
out your I/O problems.

Before I discuss how ASE performs various I/O activities, I want to describe an I/O request
from beginning to end.

When requesting a read or write, there are two ways to issue the command - synchronous or
asynchronous.

Synchronous I/O Request

This is, by far, the easiest method to understand and to code.


A process will issue a read or write system call to the operating system and then enter a sleep
state. At this stage, the kernel will continue serving other processes. When the I/O request has
completed, the kernel will wake up the user process to continue executing the next sequence
of commands.

The problem with this method is that processes are sleeping during each I/O request.

Asynchronous I/O Request

First of all, a read or write request from the host bus adapter's point of view is identical for
synchronous and asynchronous I/O requests. An application using asynchronous I/O is more
complicated, as it has to keep track itself of the I/O it has issued.

A process will issue a read or write system call to the operating system, but instead of
entering a sleep state, it can continue executing. It may decide to issue further I/O requests, as
long as it keeps track of what it has issued.

Sooner or later, the application needs to know when an I/O request has been completed. There
are three ways that a process can be informed. One way is for the application to scan all its
outstanding I/O memory structures, looking for statuses to indicate a completed I/O.
Alternately, it can set up a signal handler and wait for the kernel to send a signal to the
process to indicate an I/O has completed. Another way to inform a process is to call a wait
command until an outstanding I/O operation has completed. This may sound a bit strange,
since this is what a synchronous I/O does. It may be that your application is at a point in its
code whereby you cannot continue until a specific I/O request has completed. A good
example of this is when Sybase performs a commit transaction.

Employing one of these three options, ASE uses the signal handler to check for any
outstanding I/O.

It is good to note here that Sybase allows each process to only have one outstanding I/O at
any time. This might sound strange, but it is only aware of the next page in a chain when it
has read the preceding one. See the section titled 'Asynchronous Prefetch' to get around this
limitation.

Once the application receives an acknowledgement for an I/O, it needs to retrieve the data.

Even though the application may have been informed of a completed I/O by checking statuses
or the receipt of a signal, it has to issue the wait command to retrieve the data. The wait
command will obviously not wait, as it already knows the I/O has completed; it is just the way
the asynchronous I/O library routines have been created. I only mention this so you are not
fooled if you see this happening when doing some investigating with UNIX trace commands.

Requests

Unfortunately, not all operating systems are the same. I have tried to keep much of the topic
non-vendor specific. This is based on the way HP-UX issues I/O. To get an in-depth idea of
how an I/O is processed in Solaris, AIX, etc., review any of the vendors' manuals on writing
device drivers, and they will describe it in detail.
The hardware used consists of an HP server and a Logical Volume Manager connected to a
disk array using Fibre Channel. The disk array has some built-in cache.

Read Request

A standard read request consists of specifying a file descriptor, which is a code structure for a
single raw partition and the amount of data to read. This amount is termed the 'I/O size.'

The application opens a raw partition and allocates a file descriptor (for example,
/dev/vg01/rRAW_FILE). The user issues a read system call, specifying the file descriptor in
the call and the unit of I/O in bytes. It does not matter if it is a synchronous or asynchronous
read.

Associated with each raw partition is a major and minor number. You can display these by
running:

ls -l /dev/vg01/rRAW_FILE

The decimal number is the major number, and the hexadecimal number is the minor number.
These numbers are used by the kernel to determine the location of the disk device driver and
to set driver-specific characteristics. These are held in the inode table for the file where the
kernel reads them.

When a device driver is installed, it adds an entry to a Device Switch Table to provide entry
points to the driver's code. The kernel reads the information from the table using the major
number to locate it and sets any parameters based on the minor number. In our case, the
device driver will be a Logical Volume Manager pseudo-driver. This driver slots itself before
the Host-Based Adapter device driver to convert logical addresses to physical addresses and
route requests through specific HBAs.

Based on the major and minor number, the LVM locates the logical volume where the raw
partition resides. It will convert the request to a block number that the disk array understands
and pass the request to the HBA. The HBA is the initiator, and the disk array is the target. The
driver begins processing the I/O request by initializing data structures and requesting a SCSI
command service from the Fibre Channel Protocol (FCP).

The FCP builds and initiates a sequence, sending an FCP command over the link to the
receiving target. This sequence contains addressing information to direct the command to the
correct disk array in the event it was part of a Fibre Channel fabric. The initiator is free to
send more commands or wait. It can send up to 256 I/O requests to each logical volume
before the requests need to be queued. Usually the target is a disk drive, but in a disk array,
the target is a Fibre Channel connector. SCSI buses are inside the disk array connecting the
disks to the Fibre Channel controllers. The moving of data in the array is controlled by the
disk array's firmware.

The receiving target reads the command to determine what it needs to do. If the data is
already in cache, the target does not need to request it from disk. We will assume it is not in
cache. The target formats the command to its own propriety standard and sends the read
request to the applicable disk. At some later stage, the data that has been retrieved from disk
will be copied to the cache, where it will then be sent back to the initiator.
When the disk array's controller is ready to transfer the data to the initiator, it checks to see if
the channel is free. If it is, it transmits a signal to indicate the data is available and follows it
with the data. Depending on the size of the data, multiple chunks of data might be required for
the operation. Once all the data is sent to the initiator, the target sends a completion signal and
informs the initiator that it is ready for more requests. If the request fails, the initiator may ask
the target again or return an error code. If there was a Storage Area Network Switch in
between, then this would happen for each hop.

Once an I/O is complete, the device driver will signal the LVM, which then signals the kernel
to indicate the request is complete. The kernel will write the data into the user's processes
buffer, which has been set aside for it. The kernel will then indicate to the user process that an
I/O has completed.

With asynchronous I/O, the I/O requests may not come back in the order they were requested.
This is because the disk arrays order requests based on head movement to obtain the best
throughput.

Write Request

The write request is identical to the read request, except for the following steps:

• The HBA initiator will send a write request to the disk array when the channel is free.
It will then send the data, possibly breaking it into smaller packets.
• Once received by the array, it is copied into the array's cache and an acknowledgement
is sent back to the initiator.
• At some later stage, either because the cache is filling up or the array is idle, the data
will be sent to the disk.

Benchmarking Your Disk Storage

Now that we understand all there is to know about I/O (or at least we should), let's see how
our hardware performs.

Unless we know what kind of performance our hardware is capable of, how can we possibly
predict how long tasks will take or how much extra capacity we have for growth? We need to
run benchmarks to give us an idea of how our hardware performs. The hard part is deciding
what types of benchmarks will be useful.

The TPC Database Benchmark

The most common type of benchmark is the TPC benchmark.

TPC is the Transaction Processing Council. This is an organization made up of database


vendors who publish performance figures based on how well a specific application runs with
their product. The published figures will indicate how well their product runs on a specific
hardware platform and how much you are paying in terms of dollars per transaction.

The benchmark software is freely available, but in my opinion, nothing will be gained by
running it on your hardware. It is like running a word processing package to see how it
performs on a machine that will eventually run a spreadsheet package.
There is no better benchmark than running the application full tilt with all the users bashing
away. All we can do is configure the disks to gain the maximum benefit and the least amount
of maintenance.

Preparation

Before we run any sort of test, we need to be able to capture some performance figures. All
operating systems have tools that output performance figures, so make sure you are
comfortable with them and understand the output. Recall my comments about iostat and make
sure they are up to the job.

The tools need to output the CPU usage and the statistics on the disks. Disk array vendors
provide analyzers to show you the I/O throughput, but some are better than others. These tend
to be quite difficult to interpret unless you understand the internals of the disk array, so a
healthy supply of white papers will be your first port of call.

You will no doubt rerun tests. Don't worry if the figures do not match; they never do. They
will, however, be close, so 10 MB/sec would indicate a problem. Also, if you perform reads,
the reads may come from cache the second time around, so checking the size of the cache
against what you have read should tell you if you are doing reads from the disk.

When running performance tools, set the recording interval to five minutes. If the figures
output are averaged too much, then reduce it to one or two minutes, or even smaller.

You will be surprised how much load a performance tool puts on a system just to run. Some
can add 50% to the overall running time. These tend to be used more for gathering statistics
for capacity planning purposes than sorting out performance problems.

Make sure that you are the only one using the hardware. You will never have another
opportunity to try this once the machine is in use, so make the most of it. Also, make sure that
the kernel is configured the way it will be used in production.

What Are You Going to Test?

The following tests are for striped sets. If you know that you will be using RAID 5, then
configure the raw partitions as RAID 5. I will assume that you are using stripes.

Single Disk

Knowing the limitations of a single disk will at least let you know what the worst
performance is likely to be. When configuring arrays, you will need to configure stripe sets to
get the performance of, at worst, RAID 5 sets.

Multiple Disks

Next, we will test striping over a group of disks. The idea is to get concurrent I/O so the
number of disks in the set will be equal to the number of internal adapters, each servicing a set
number of disks. For example, the array could have 64 disks and eight controllers, so each
controller services eight disks. The stripe set would be made up of eight disks, each from a
different controller. Also, some controllers have multiple CPUs, so the stripe set could double.
First Benchmark - Single Disk

The first test is to find out just how fast you can read and write to a disk. If you know what
disks you are using, you should have a rough idea what the speed will be. Create a 4 GB raw
partition; this will enable you to run a test for at least five minutes. If your disk array has a
cache, then you know the first part of the test will be faster. You can ignore the first two
minutes of the test, or you can calculate when you are likely to fill it up and start monitoring
from that point.

You will need a program to issue reads and writes. The UNIX command dd can be used for
both. If you want to run a sequential test, it is a case of running from the beginning of the raw
partition to the end. For a random test, you will have to script a list of dds to start and stop at
different blocks. There are free benchmark tools on the market if you want something that
will report back what it thinks it has done. Try not to use too many at once, or it may confuse
you.

Write Test

We will use dd for running some sequential read and write tests. Note, however, that dd will
issue read and writes using synchronous I/O, so we will have to issue multiple dds to saturate
the disk.

Try running groups of 5, 10, 15, 20, and so on in the background until the I/O throughput does
not get any faster. At this point, either the disk has reached its limit or the CPU will be 100%.

Test the I/O throughput for different block sizes. You can do this by specifying /dev/zero as
your input file. This will write binary zeroes to the entire raw partition. Check the CPU usage,
although I couldn't envision it hitting 100%.

For example:

dd if=/dev/zero of=/dev/rRAWDISK bs=blocksize

Read Test

With the read test we can write the output to /dev/null. Add a different seek value so they are
not all reading from cache. You will need to perform the same procedures as you did for the
write tests.

For example:

dd if=/dev/rRAWDISK of=/dev/null bs=n seek=n

Second Benchmark - Multiple Disks

This benchmark will take a bit more effort since we are testing two things: The first is the
block size, and the second is the stripe width. The stripe width is the amount of data that will
be written to each disk before it moves on to the next disk in the set.
When I discussed disks, I mentioned that a stripe width should be equal to the number of
tracks or cylinders. This is based on the amount of cache the disk controller has. This is great
for sequential writes because the disk heads do not have to move as much. With random
writes, the I/O requests jump around between disks and so do the throughput drops.

Repeat the same tests that you performed for a single disk and note the difference in
throughput. Also, it would only be worth running the same number of dds that gave you the
best performance.

Results from dd

After you have run your tests, you will have a table that looks roughly like the one below. It is
up to you what you want to log but you may want to include things like I/O per second, KB
per second, and total number after a fixed amount of time.

Test Read Write


16K 64K 128K CPU 16K 64K 128K CPU
Single Disk - 1 dd
Single Disk - 10 dds
Single Disk - 20 dds
Single Disk - 30 dds
Multiple Disks - 32K
Stripe
Multiple Disks - 64K
Stripe
Multiple Disks - 128K
Stripe

You now will have an idea of the performance of your disks. The reality is that you will never
hit these figures in your application. It is more realistic to aim for 40% of these figures when
doing tests within ASE. The reason for this is the tests are not making use of the data, whereas
in real life, additional I/O and CPU processing would have taken place. When data is
manipulated in memory, you can add extra time to process it, so this is where your 40% figure
comes from.

The next set of benchmarks that need performing are carried out on the ASE server itself.
Before the server is built though, you need to decide how you will create your raw partitions.

The figures in the previous table should tell you what stripe width is the best. The best
performance comes from stripes. If you decide on mirrored stripes, then all is well and good.
If for one reason or another you decide you want to go with RAID 5, then you can skip the
next section on striping.

Striping for Performance

You might think that there is not much to say when it comes to striping. You select a group of
disks, select the stripe width, and away you go. This is the approach many people take,
especially in test systems where performance is not the issue. The problem arises when you
decide what disks to pick.

Disk Layouts

To get as much I/O through as possible, the idea has always been to get as many physical
disks involved as possible, where the number of disks would match the number of disk
adapters. If each adapter serviced a group of disks, the DBA would lay out his raw partitions
covering one of the disks on each adapter until all the disks are used.

The DBA would then look at the application and segment tables and indexes across these raw
partitions, fine-tuning the layout and reducing hot spots. This fine-tuning would focus on the
most critical transactions.

The disk layout has now become so specialized that adding future disk space would need
careful planning. Decisions cannot be made at the application level without considering the
impact at the disk level. Even changing join clauses could upset the balance.

So how can we have a disk layout for performance that will not degrade over time? The
answer is wide thin stripes.

Wide Thin Stripes

The idea behind wide thin stripes is that every table and index is striped over so many disks
that it is impossible to get hot spots. If your array has 64 disks, then stripe each raw partition
over 64 disks. This way, part of every table and every index is on every disk.

In the 1800s, economist Vilfredo Pareto devised the 80/20 Principle. It has proven to be
correct in a number of areas, and with disk I/O it is fair to say that 80% of all disk accesses
are confined to 20% of the data. This is why caches work so well.

You may think this will increase accesses to different portions of the same disk and increase
seek and latency times. It might on a single process, but today's servers are running hundreds
of clients simultaneously, all executing transactions at different times. A nice, efficient table
scan performing sequential I/O could be broken up by a concurrent index lookup on the same
table; even two sequential lookups on the same table starting at slightly different times can
mess up the I/O balance.

It is this concurrent access pattern that kills I/O performance. You can see from your
benchmarks that sequential is far better than random, but sequential access covers only a
small percentage of disk activity.

If you are able to run a performance tool that displays I/O for each physical disk in the array,
run it over a period of time and see how many disks are being used per second. You will
probably find that no more than 10% of the disks are being used at any one time.

The more you know about the application, the more you can decide what the best overall
setup will be. I say overall because a carefully planned disk layout will outperform a wide thin
stripe layout in the beginning. If the database grows, the I/O will degrade over time. Wide thin
stripes will fluctuate marginally and never degrade.
Sybase Kernel Scheduling

Before you can appreciate large page sizes, sp_sysmon, asynchronous prefetch, etc., you need
to understand how the ASE kernel schedules processes and I/O.

The following explanation is a simplistic view of this process.

Each ASE engine runs as an operating system (OS) thread, and like any OS thread, it takes
turns running on the CPUs. ASE was built to perform its own scheduling of threads for all
ASE connections, acting like its own OS. Since ASE does its own scheduling, it will always
appear busy to the OS, even though it is not running any ASE processes. This is because it
loops around checking for any new connections, disk I/O, network I/O, and tasks ready to be
executed.

This loop is controlled by the parameter:

sp_configure "runnable process search count",

which is 2000 repetitions by default.

Each time around the loop, ASE does the following:

1. Checks for a task to run.


2. If there are no tasks to run, it decrements the loop count and checks for any network or
disk I/O.
3. If after 2000 loops there were no tasks to process, then it yields the ASE process to the
OS.

The larger the value, the less often ASE will give up a CPU to the OS. If your host only runs
ASE, setting the value to 1 will cause it to never yield. Setting to a higher value causes ASE
to hold on to the CPU longer, reducing the time OS threads can use it.

If there was always a task to run, the kernel would spend all its time running processes and
never checking for disk or network I/O. So ASE has a second configuration parameter to
indicate when to stop checking for runnable tasks and to check for any disk or network I/O:

sp_configure "I/O polling process count",

which is ten tasks by default.

This indicates that regardless of whether there are tasks to execute, look for network and disk
I/O after ten tasks have been run. By reducing this value, the kernel checks for I/O more
frequently when the server is busy.

Context Switching

Context switching is a term used to describe a task going from running to a sleeping state.
This is typically due to either a resource not being available so the task sleeps until it becomes
available or the user's timeslice has been exceeded and it voluntarily frees the engine to
enable another task to use it.
A timeslice is an allotted amount of time for a task to perform its work. By default, a task can
execute for a tenth of a second, after which it must yield the engine. This is controlled by:

sp_configure "time slice," default is 100 milliseconds.

To understand a timeslice, we need to understand how the ASE kernel keeps track of time.

How ASE Kernel Keeps Track of Time

ASE controls time by receiving a signal from the OS at predefined intervals. The default is a
tenth of a second, the same as timeslice. This is controlled by the clock rate:

sp_configure "sql server clock tick length,"

Note that the default is 100000 microseconds.

Timeslice and CPU Grace Time

When a task begins executing, its execution counter is set to the timeslice divided by the clock
rate. By default, this will be one since both are a tenth of a second.

When the OS signals ASE, it subtracts one from the execution counter. If the execution
counter is less than zero, it marks the task as yieldable. It may seem strange that it checks for
less than zero, but the task could have begun executing 50% through a clock rate period. If
ASE checked for zero, then a task would not get a full timeslice. Giving a task a bit more time
is not a problem, as most tasks will context switch waiting for a resource before the timeslice
has been exceeded.

The only way a task can check if it has exceeded the timeslice is when it gets to a point in the
ASE code called a yield point. If a task has exceeded its timeslice and has not yielded (i.e., it
has not reached a yield point in the code), the kernel will allow it some grace time. CPU grace
time is the period of time the kernel allows a task to continue executing, but it must yield at
the earliest opportunity. The length of time is controlled by:

sp_configure "cpu grace time,"

Note that the default is 500 milliseconds, or half a second.

If after 500 milliseconds the process has not yielded, the kernel will kill the process and send
a stack trace to the errorlog, indicating a timeslice error.

By increasing the clock rate, timeslice, or grace time, CPU-intensive tasks get more CPU time
and reduce the time the kernel checks for I/O. Be very prudent if you decide to alter these
parameters.

Try not to alter time slice value, since it will affect all tasks on ASE. Instead, try to change the
CPU grace time parameter, which will affect only required tasks on ASE.

Once you understand kernel scheduling, the output from sp_sysmon will make more sense.
Asynchronous Prefetch

Asynchronous prefetch (APF) is an enhancement to enable a process in ASE to issue multiple


physical read requests. This helps to reduce the number of times a process has to sleep waiting
for an I/O. APF can use well-defined access patterns to retrieve I/O or can issue speculative
guesses.

Traditional cache management waits for a process to request data, causing a serial type
process. The service time for these requests are dominated by seek and latency times. With
proactive cache management, data is retrieved in parallel while processes are busy doing other
tasks.

ASE uses a governor to monitor the pages retrieved using APF to ensure the best performance
is being achieved.

If the workload is mostly random in nature, APF may not improve performance very much,
much like adding extra processors for a single-threaded program. Also, if the I/O subsystem is
already saturated, issuing extra I/O will add to the problem.

Retrieving too much data can flush useful data out, so careful planning is needed. Even if
APF reads some pages that are never used, the advantages of using APF still outweigh the
disadvantages. APF may, however, disable itself if it detects it is retrieving a large amount of
pages that it does not need.

The types of access where APF can improve performance is:

• Table scans
• Clustered index range scans
• Covered non-clustered queries
• Non-clustered index accesses
• dbccs
• Update statistics
• Recovery

Let's use the table scan as an example to describe what happens inside ASE without APF: The
server reads the first page of the table and checks what the next page in the chain is. If that
page is not in cache, an I/O is issued and the process is put to sleep. When the process reaches
the top of the run queue, it checks that the I/O has completed. If it has, it repeats this sequence
again until all the pages in the table have been read.

As you can see, the process only issues one I/O at any time. With APF, it can issue multiple
I/Os, so the user process does not have to sleep as often.

Look-Ahead Sets

APF works by building look-ahead sets. These are lists of the pages APF will use to issue I/O
requests. Depending on the operation being performed, the look-ahead set will be built in
various ways.
Look-ahead sets are deemed predictable when APF knows what pages are needed. Scanning
index pages or reading log records are good examples.

Speculative look-ahead sets are when APF does not know what the next page is, so it makes a
speculative guess. An example is a table scan. ASE only knows what the next page in the
chain is by reading the preceding one. So APF issues I/O for all pages in the allocation unit
and the governor monitors if there is any jumping between allocation units due to the
fragmentation. It does this to determine the usefulness and whether it should continue
prefetching.

Let's look at the look-ahead sets for the various types of access:

• Table scans, clustered index range scans, covered non-clustered queries - During
these types of queries, APF uses the allocation pages to list the pages used by the
object. The more extents used by the object in the allocation unit, the bigger the look-
ahead set. If the APF governor detects jumping around between allocation units due to
fragmentation, it will reduce the size of the look-ahead set.
• Non-clustered index accesses, update statistics - When retrieving data pages, APF
uses the rows for all qualifying index values. From these, it builds a look-ahead set as
long as two or more rows qualify. APF will also include the next leaf page to speed
I/O along.
• dbccs - Both checkdb and checkalloc will scan the page chains, so the look-ahead set
is the same as that for table scans. If checkdb has to check non-clustered indexes, the
look-ahead set is built the same as the non-clustered index scan above.
• Recovery - During recovery, ASE reads each log page that includes rows for a
transaction and then all the data and index pages referenced by the transaction. The log
is stored sequentially on disk so APF will retrieve all the pages in the allocation unit.
This only works well if data and log are not mixed. As each log page is read, APF
builds a look-ahead set for all the data and index pages referenced. This ensures that
the APF is always one step ahead of the recovery process.

Not all types of log records are read using APF, so do not worry if sp_sysmon shows non-
APF reads.

Configuring APF

The trick with APF is knowing if it will increase performance to the detriment of other tasks.
Reading extra pages while doing a table scan will no doubt improve the table scan, but if in
the process it flushes out static pages, then the server may end up doing more physical I/O
overall.

APF is configured at the pool level. Each pool, by default, is allowed to bring in 10% of the
number of pages it can hold by the APF mechanism. This is controlled by:

sp_configure "global async prefetch limit"

The default is 10.

To see what limits are set for each pool, run:


sp_cacheconfig

To change individual pools, use:

sp_poolconfig

For example, to set the APF limit to 100% in the 2K pool in the default data cache:

sp_poolconfig "default data cache," "2K," "local async prefetch limit=100."

APF and Fetch and Discard Cache Strategy

When a query runs using the fetch and discard (MRU) strategy, the pages are linked at the
wash marker. This is the location in the cache chain where pages are flushed to disk and
reused. This prevents a query from flushing out the cache.

With APF, a query that uses the MRU strategy cannot place the pages at the wash marker
because they may get flushed out before they are used. APF and MRU are treated differently.
The pages are linked at the MRU end of the chain instead of the wash marker. When the
query accesses the page, it is re-linked at the wash marker ready to be flushed.

The downside to this is that unused pages fetched using APF will remain in the cache and will
traverse the chain as normal.

Monitoring APF Using sp_sysmon

This section reports APF activity for all caches combined.

Cache Statistics Summary (All Caches)

Asynchronous Prefetch Activity


APFs Issued 195.0 1462.3 11698
17.7 %
APFs Denied Due To
APF I/O Overloads 0.2 1.1 9
0.0 %
APF Limit Overloads 0.0 0.0 0
0.0 %
APF Reused Overloads 0.2 1.8 14
0.0 %
APF Buffers Found in Cache
With Spinlock Held 9.3 70.0 560
0.8 %
W/o Spinlock Held 894.0 6704.6 53637
81.4 %
------------------------- ------------ ------------ ----------
Total APFs Requested 1098.6 8239.8 65918

Other Asynchronous Prefetch Statistics


APFs Used 193.5 1451.3 11610
n/a
APF Waits for I/O 182.2 1366.6 10933
n/a
APF Discards 0.0 0.0 0
n/a
• APFs Issued - The number of I/Os of all sizes issued by APF.
• APFs Denied Due To: APF I/O Overloads - This indicates that either there was disk
semaphore contention or there were no free disk I/O structures. If this value is high,
then check the Disk I/O Management section and Disk Activity section of sp_sysmon
for signs of these.
• APFs Denied Due To: APF Limit Overloads - The number of outstanding read
ahead requests yet to be used by the server has exceeded the user-configured value for
a buffer pool.
• APFs Denied Due To: APF Reused Overloads - A page read into the cache was
flushed out before the query could use it.
• APF Buffers Found in Cache: With Spinlock Held - APF has to scan the cache to
ensure it doesn't bring the same page in twice. It tries to do this without holding on to
a spinlock. This indicates the number of times it had to scan the cache with a spinlock
held.
• APF Buffers Found in Cache: W/o Spinlock Held - The number of cache scans
without holding on to a spinlock. This increases performance and is the preferred
method. There is nothing we can do to influence it.
• APFs Used - This indicates the number of pages brought in by APF that were used.
Note, however, that the value here also indicates pages brought in that may have been
retrieved outside of this sampling interval.
• APFs Waits For I/O - This reports the number of times a query had to wait for the
APF to complete. It means that the APF did not issue the request early enough. This
could mean that you are not checking for I/O quick enough. You will always get some
waits because the query reads the first page and then waits for the second that APF is
starting to get, the APF is building a look-ahead set for another allocation unit, or the
amount of processing that has to be done on the page is minimal.
• APF Discards - This indicates if pages read by APF were discarded before they were
used. This could indicate the cache is too small or that due to fragmentation, there are
many pages that are not needed.

APF Tips

If you set the parameter sp_configure pre-allocated extents to the maximum of 31, even
though you could potentially be wasting space, look-ahead sets will be larger. As long as the
I/O subsystem is not saturated, I/O performance will increase. Also, fragmentation will be
less, helping many other types of processing, not just APF.

During recovery, ASE only uses the default logical page size in the default data cache. To
speed recovery, initially set the APF limits on the pool to 100% and increase the size of the
pool. Once recovery is complete, you can set the sizes back to their previous values.

During recovery, APF issues a third lookup ahead of the transaction currently being
processed. It actually reads, by default, 24 pages ahead. If APFs Wait for I/O starts to
increase, you can try to amend the number of pages ahead by running dbcc
tune(apf_lookahead,n).

Always set a large I/O pool for dbcc checkalloc. If you do not have a large pool, then
checkalloc can use large internal buffers to increase performance. However, it will only use
the buffers if you set the APF limit to 0 on the small pool.
Run regular maintenance tasks to reduce fragmentation. This will ensure look-ahead sets are
as large as possible and do not stall.

Sybase ASE has many more counters for asynchronous prefetch that are not shown by
sp_sysmon. Try creating your own script and displaying these during your tests.

Logical Page Sizes

Prior to ASE version 12.5, data was stored on disk in 2K chunks with the ability to read or
write data using 2K, 4K, 8K, and 16K I/O operations. ASE version 12.5 gives you the ability
to store data on disk in chunks of 2K, 4K, 8K, and 16K. In addition, the I/O size has increased
from 16K to 32K, 64K, and 128K.

If you ran a benchmark with I/O sizes from 2K up to 512K, you can appreciate that you get
better throughput with I/O sizes above 16K. Each I/O is getting back more data, and so less
context switching occurrs.

If, on the other hand, you only use a small portion of the data returned, then you could be
flushing data out of cache that will warrant another physical I/O. This is where you need to
have a good understanding of the application. OLTP applications will work well with a 2K
page size, whereas a DSS application will work better with the larger sizes.

Varying the Page Size

Configuring a server to allow page sizes greater than 2K is either all or nothing. Every
database has to be at that size, and so the only way to change a page size is by rebuilding the
server. However, I am not going to go into rebuilding servers; I'll concentrate on the
performance side.

To check what page size a server is using, issue:

select @@maxpagesize

You can also check the value of the -z switch to the dataserver executable in the RUN server
file.

Space Allocation

Larger page sizes give better performance, but it increases the minimum amount of space each
object uses.

Below is a list of the differences between space allocations when using the larger page sizes.

Page Size Item


2K 4K 8K 16K
Number of pages in an extent 8 8 8 8
Size of one extent 16K 32K 64K 128K
Number of extents in an allocation unit 32 32 32 32
Number of pages in an allocation unit 256 256 256 256
Size of an allocation unit 0.5MB 1MB 2MB 4MB
Minimum table size 16K 32K 64K 128K

There are many more differences, but I wanted to point out that allocation units have
increased in size, and tasks that operate at the allocation unit level will perform better because
they will read fewer allocation pages. There is always a trade-off with performance. Where
you gain with one, you will lose with the other. A general rule of thumb is when you wait for
fewer resources the CPU will increase.

What to Look Out for When Using Larger Page Sizes

Before you decide to use larger page sizes, make sure you know if your I/O system can cope
with it. I/O system saturation could be a result of too many I/Os being issued or simply that
your disk storage is not up to the job. Issuing too many I/Os could be forcing out pages from
cache.

When single or multiple rows are retrieved from a table, the server will issue an I/O based on
the page size and the pools available at the time. For a 16K page size, the minimum I/O is
16K. If the server is issuing I/Os greater than this, you will need to monitor the Large I/O
Effectiveness output from sp_sysmon to determine if your page size is too high.

Tasks That Benefit from Large Page Sizes

• Asynchronous prefetch - For me, this is one of two features that will benefit the
most, as long as your I/O system can cope with the extra load. Allocation pages and
index leaf pages are bigger and this alone will generate more I/O, as long as the cache
prefetch limit has been exceeded.
• Recovery - This is the second most important feature that will benefit from larger I/O,
in my opinion. Those who manage terabyte databases are painfully aware of how long
a recovery process can take. In the past, all I/O was done in 2K, so the performance
came from asynchronous prefetch. The limit is now 16K, and so that all-important
database will get online quicker.
• dbcc checkdb - This has always suffered from the 2K limit when accessing the index
pages. Using asynchronous prefetch will speed things up, but you may need larger
pools to reduce the physical I/O when comparing index rows with data page rows.
• dbcc checkalloc - This dbcc validates objects on allocation pages. It will use a
combination of the page size and largest pool size along with asynchronous prefetch to
get the data. Percentage-wise, the performance increase will be a lot better than
checkdb.

Note Start making use of user-defined caches. If you have read-only tables, then a cache
configured with the relaxed cache strategy will be better.

Large Page Sizes and Locking Issues

The biggest drawback with large logical page sizes is locking at the page level. This will have
a big effect when converting a database to a larger page size. During testing, it is not always
possible to simulate the same level of activity as appears in production, so locking contention
may only prove a problem once you have gone live.

If you are running an OLTP system or a system with a high volume of writes, unless you
switch to data-only locking, remaining at 2K logical page size is your best option.

Note Data-only locking adds around 20% extra CPU utilization and around 15% extra disk space
to process and store each table.

Benchmarking Sybase ASE

At this stage, you will have server-built and raw partitions configured in an optimal way and
have an idea of the expected performance from the I/O system. What you will not know is
how long these tasks take in ASE.

Although I am not going to give you specific tasks to run, I will list what I consider to be I/O-
intensive tasks. It is up to you to have an idea of the length of time it takes to run them (not
just the time it takes to run, but the impact it has when running alongside other tasks).

The reason for running benchmarks is to have an expected idea of how long a task will take.
This becomes even more critical when the databases get into the hundreds of gigabytes. If you
have corruption, you may need to run a dbcc pglinkage on a table. If the table is 10 GB, you
will roughly know the length of time it will take.

You should be able to work out that, say, dbcc checkalloc will take five minutes per gigabyte
but will drop off to six minutes per gigabyte if a certain application runs.

There are two types of benchmarking that you now need to perform:

• Application benchmark
• DBA maintenance benchmark

Application Benchmark

The application benchmark gathers statistics on the application and finds out how long
specific tasks take.

When faced with a new application, I start by talking with the developers or the people who
run the application to find out what the most critical part of the application is. With a
telecommunications application, the critical part is getting the bills out to the customers. With
a banking application, it could be the processing of trades.

Even though we have configured our disks to the best of their ability, we need to know if we
are at the maximum throughput. Putting aside bad queries, most I/O is generated by running
multiple instances of client applications. An example would be a program that bulk loads data
to a table. This will benefit by running multiple instances.

DBA Maintenance Benchmark

Most DBA maintenance activities will be I/O intensive. These are:


• dbccs
• Backups
• Update statistics
• Create indexes
• Data uploads

These tasks will all generate substantial amounts of I/O. Each will perform physical I/O and
fill the data caches. When using them, you can expect the I/O system and CPU usage to
increase.

• dbccs - When talking about dbcc, we tend to only think about checkalloc,
checkstorage, and checkdb. There are others that generate large amounts of I/O and
CPU usage when dealing with corruption, so have an idea what they are doing before
you run them.

checkalloc scans page chains and validates pointers. The CPU cost tends to be less
than that of a checkdb that validates rows. checkstorage was developed to improve the
duration of the run by using parallel processes and omitting the validation of index
rows.

• Backups - Backups tend to run at the speed of the backup device. It scans allocation
pages looking for used pages in the allocation unit while keeping track of pages being
updated during phase one of the backup. Stripes enable simultaneous I/O to be issued,
increasing the I/O throughput.
• Create indexes - Creating indexes involves performing table scans and generating
work tables to sort. This places a high demand on I/O, especially if parallel sorts are
enabled. It also will lock out tables to ensure it won't conflict with your application.
• Update statistics - With the ability to add statistics on non-indexed columns and all
columns in an index, update statistics can take a lot longer. Each column will be
scanned once to calculate the indexes.
• Data uploads - Data warehouses are usually updated on a periodic basis. Demands for
I/O are high, especially if the data has to be massaged before the users can access it.

Once you understand how long everything runs and the resources they use, then capacity
planning takes care of itself, and you are not left with a dying system.

The DBA needs to understand where the bottlenecks are and make suggested
recommendations. Performance and tuning is all about bottlenecks. When one is removed, it
will move on to the next. Eventually, you should be left with the following:

I have reached the limit of my hardware to perform these set tasks by these set times.

I have reached the limit of Sybase ASE and the bottleneck cannot be removed unless the
software is altered. A classic example is cachelets. If you ever had a system that did no disk
I/O and everything was in memory and you had lots of logical I/O, the bottleneck was the
spinlock on the cache. Unless you created lots of user-defined caches and bound objects to
these caches, you could not reduce this contention. It was not always practical to do this, so
cachelets provide a way to have multiple spinlocks per cache.
Understanding I/O Statistics in sp_sysmon

This brings us to the final section of the chapter. What do all the numbers mean in sp_sysmon
that relate to I/O?

This also brings us back to the first question: Is 500 I/O per second on a disk device a
problem? Hopefully when you see this, you will know.

Below are snippets from an sp_sysmon I ran. Ignore the numbers; they are for display
purposes. After each section, I will list any tips to improve I/O throughput.

Kernel Utilization
Disk I/O Checks per sec per xact count % of
total
--------------------------- ------------ ------------ ---------- ------
----
Total Disk I/O Checks 237111.3 1778334.8 14226678
n/a
Checks Returning I/O 234903.6 1761776.6 14094213
99.1 %
Avg Disk I/Os Returned n/a n/a 0.00789
n/a

• Total Disk I/O Checks - This is the number of checks for completed I/O that occurred
during the snapshot period. It is controlled by the following sp_configure parameters:
sql server clock rate ticks, timeslice, cpu grace time, runnable process search count,
and I/O polling process count.
• Checks Returning I/O - This message is a bit confusing. It should read, 'Checks that
could possibly return a completed I/O.' It does not indicate the number of completed
I/O. Each time the kernel checked for I/O that were pending, this value is increased.
• Avg Disk I/Os Returned - This value is derived by taking the Checks Returning I/O
and dividing it by the number of completed I/O during the snapshot period.

Kernel Utilization Tips

The Kernel Utilization section shows how busy the engines are and what type of work the
engines are performing.

Every time a process has to wait for I/O, it will be switched off the engine unless its timeslice
has expired. Low engine usage indicates either no activity or ASE is waiting for a resource.
High usage indicates high activity or that there is no waiting for resources, and processes are
using their allotted timeslice.

sp_sysmon is unable to show how long ASE has been sleeping when switched off the CPU by
the OS or when the runnable process search count limit has been reached. When ASE is
sleeping, it cannot perform any I/O, so to improve performance, you can increase runnable
process search count so it yields less often or set it to zero so it does not yield at all.

ASE will always show around 100% utilization when seen through an OS performance tool,
so you can starve CPU time for other OS processes if ASE yields less often.
I mentioned that I/O polling process count determines how many processes are executed
before ASE will check for I/O. It will always check for I/O if there are no processes to
execute, but if there are multiple processes running that are CPU-bound, you can increase I/O
performance by reducing I/O polling process count to below the default.

Task Management
per sec per xact count % of
total
------------ ------------ ---------- ------
----
Task Context Switches Due To:
Voluntary Yields 86.9 651.9 5215
6.1 %
Cache Search Misses 788.8 5916.3 47330
55.7 %
System Disk Writes 94.9 711.9 5695
6.7 %
I/O Pacing 56.6 424.8 3398
4.0 %
Logical Lock Contention 0.0 0.1 1
0.0 %
Address Lock Contention 0.0 0.0 0
0.0 %
Log Semaphore Contention 0.5 3.6 29
0.0 %
Group Commit Sleeps 18.4 138.3 1106
1.3 %
Last Log Page Writes 83.8 628.3 5026
5.9 %
Modify Conflicts 16.8 126.1 1009
1.2 %
I/O Device Contention 101.0 757.4 6059
7.1 %
Network Packet Received 1.8 13.4 107
0.1 %
Network Packet Sent 0.6 4.8 38
0.0 %
SYSINDEXES Lookup 0.0 0.0 0
0.0 %
Other Causes 167.2 1253.9 10031
11.8 %

Task Management shows context switching within the ASE server indicating what causes a
user process to be switched off an engine.

Every user process has an allotted amount of time to perform its work. If there are insufficient
resources to continue its work, it will request the resource and then go to sleep until it is
allocated another timeslice. If the resource becomes available during this time, the user
process continues.

The Voluntary Yields field indicates if the user process put itself to sleep because it used its
entire timeslice. The other fields indicate the user process went to sleep for other reasons.

The fields relating to I/O are:


• Cache Search Misses - This is the common reason a user process is switched off an
engine. A page that the process requires is not in cache, so an I/O is issued and the
user process sleeps until the I/O completes.
• I/O Pacing - This figure represents the number of times a user process was switched
off an engine due to too many I/Os being issued. Since we know a user process can
only have one outstanding I/O, this must be for something else.

There is a limit in Sybase for batching I/Os, which is ten. If a task exceeds ten
outstanding I/Os, the task is switched off the engine. The tasks that can issue more
than one I/O are things like the checkpoint process issuing writes for dirty pages or
when a transaction commits writing out the log pages.

• Log Semaphore Contention - This value indicates high log usage that is causing I/Os
to be stalled due to another task accessing the log device. Read the description later in
this chapter regarding how ASE uses a semaphore to control access to disk structures
to prevent two processes issuing I/O simultaneously. It also explains how to reduce
semaphore contention.
• Group Commit Sleeps - When a user process commits a transaction, the log pages
have to be flushed to disk. When multiple concurrent transactions from different users
exist, the log records are intermingled in cache. When a transaction commits, the log
pages containing the transaction have to be written to disk. Log pages were being
written more than once when each transaction was committed. To reduce flooding the
I/O subsystem, the task is put to sleep, and when it comes to the head of the run queue,
the log records that should have been written and are still dirty are written out at this
point. The idea is that another process issues a commit and the pages only need to be
written out once.
• I/O Device Contention - This field indicates a task could not obtain a semaphore for a
non-log device. Read the description later in this chapter regarding how ASE uses a
semaphore to control access to disk structures to prevent two processes issuing I/O
simultaneously. It also explains how to reduce semaphore contention.

Task Management Tips

When you look at the Task Management section of sp_sysmon, the Task Context Switches
Due To list gives the reasons why a user task is switched off an engine. What is not obvious is
that some of the values are made up from more than one counter. By creating your own
version of sp_sysmon, you can break down the counters for a better understanding of what is
causing various values.

I/O pacing is made up of three counters, which when broken down tell you whether the delay
was due to log writes or data page writes.

When Sybase tasks issue I/Os in batches, the default is ten. If your I/O system is not saturated,
then you can increase this value up to a maximum of 50 by running dbcc
tune(maxwritedes,n).

Group commit tries to reduce the number of times a log page gets written to disk. The
problem is that if you are using a large log I/O size, your user processes may be stalling,
waiting for other processes to commit. In this instance, if you have a low transaction rate and
the default log size is not 2K, try reducing it to cause flushing more frequently. If the I/O is
issued in a steady stream, the I/O subsystem may be less saturated.

Cache Statistics

Cache Statistics is shown as a summary of all caches and then broken down for each cache.
The idea is that the more data held in cache, the less physical I/O that is needed to reread data
and more time could be spent writing data.

The following is an extract from the summary portion. I have not shown asynchronous
prefetch, as this is described in its own section. Also, rather than discussing every value, I
have included just enough to describe the I/O issues.

Cache Statistics Summary (All Caches)


-------------------------------------
per sec per xact count % of
total
------------ ------------ ---------- ------
----

Cache Search Summary


Total Cache Hits 41512.4 311343.0 2490744
65.3 %
Total Cache Misses 22097.1 165727.9 1325823
34.7 %
------------------------- ------------ ------------ ----------
Total Cache Searches 63609.5 477070.9 3816567

Cache Turnover
Buffers Grabbed 1713.4 12850.1 102801
n/a
Buffers Grabbed Dirty 0.0 0.0 0
0.0 %

Cache Strategy Summary


Cached (LRU) Buffers 40933.4 307000.5 2456004
97.0 %
Discarded (MRU) Buffers 1269.0 9517.1 76137
3.0 %

Large I/O Usage


Large I/Os Performed 438.4 3288.1 26305
92.8 %
Large I/Os Denied 34.2 256.6 2053
7.2 %
------------------------- ------------ ------------ ----------
Total Large I/O Requests 472.6 3544.8 28358

Large I/O Effectiveness


Pages by Lrg I/O Cached 1750.5 13128.5 105028
n/a
Pages by Lrg I/O Used 1002.6 7519.4 60155
57.3 %

• Cache Hits - Represents a request for a page that is already in cache.


• Cache Misses - When a page is not in cache, it is shown as a miss and a request for a
physical I/O is performed. The request could be 2K up to 128K.
• Cache Turnover - This figure gives you a good idea if your cache is too small. Every
time a page is retrieved from the LRU end of that chain to reuse, this value is
incremented by one. It also shows if that page is dirty, which will incur a wait while it
is flushed to disk.
• Cache Strategy Summary - When a page is requested, depending on the type of
query, ASE will place the page at the wash marker to prevent a runaway query from
flushing out pages from the cache. Unless you have an understanding of the
application, it will be difficult to know if buffers should be discarded straight away or
not.
• Large I/O Usage - Any request for I/O, other than the logical page size, is logged as a
large I/O. Benchmarks show that a 128K I/O provides greater throughput than 2K I/O.
If a large I/O is requested, but some of the pages are already in cache, the request gets
demoted to the logical page size. Also, the extent consisting of the allocation page is
always done at the logical page size.
• Large I/O Effectiveness - Of the pages brought into cache using a large I/O, some of
them may never be used. Fragmentation is the biggest culprit.

Cache Statistics Tips

When looking at cache misses, ignore the % of total column. The per sec figure indicates the
number of physical I/Os issued per second. If it is 10000 and the percentage is 0.5%, this is a
problem.

If your cache turnover is zero, it could mean your cache is oversized. Use this value when you
configure a relaxed cache if you do not want to incur the performance hit when you need to
reuse buffers.

Look at cache turnover for each pool size to determine the number of I/Os for each page size.
This will enable you to determine the amount of data retrieved from each device.

Keep object statistics up to date as much as possible to ensure the best cache strategy is used.
Poor choices could also be a result of the optimizer not knowing enough about the data. See
Chapter 5, 'The Optimizer Statistics,' for a better understanding.

Disk I/O Management


Max Outstanding I/Os per sec per xact count % of
total
------------------------- ------------ ------------ ---------- --------
--
Server n/a n/a 101 n/a
Engine 0 n/a n/a 59 n/a
Engine 1 n/a n/a 24 n/a
Engine 2 n/a n/a 57 n/a
Engine 3 n/a n/a 80 n/a
Engine 4 n/a n/a 18 n/a
Engine 5 n/a n/a 0 n/a

I/Os Delayed by
Disk I/O Structures n/a n/a 0 n/a
Server Config Limit n/a n/a 0 n/a
Engine Config Limit n/a n/a 0 n/a
Operating System Limit n/a n/a 0 n/a
Total Requested Disk I/Os 1901.0 14257.4 114059

Completed Disk I/Os


Engine 0 308.1 2311.0 18488 16.6
%
Engine 1 388.6 2914.4 23315 21.0
%
Engine 2 447.9 3358.9 26871 24.2
%
Engine 3 439.4 3295.4 26363 23.7
%
Engine 4 269.9 2023.9 16191 14.6
%
Engine 5 0.0 0.0 0 0.0
%
------------------------- ------------ ------------ ----------
Total Completed I/Os 1853.8 13903.5 111228

These figures represent I/O activity at the engine and server level. The figures will look
different depending on if you are using synchronous or asynchronous I/O.

With synchronous I/O, each process can only issue one I/O at any time. It does not use disk
structures to control I/O, and so the I/Os Delayed by section will always be zero.

With asynchronous I/O, multiple I/O requests can be issued, and so these figures will be
higher.

• Max Outstanding I/Os - Any I/O issued by a user process or any other process that is
outstanding is shown here.
• I/Os Delayed by - These figures indicate whether an I/O was stalled due to limits
configured in the ASE server or the OS.
• Total Requested Disk I/Os - All I/Os issued regardless of process are indicated here.
• Total Completed I/Os - This shows the number of issued I/Os that completed.

Disk I/O Management Tips

When configuring the number of disk I/O structures for asynchronous I/O, never exceed what
the OS can give you. The server and engine config limits should also be amended from the
default to a more realistic figure. The reason for this is because if the server tries to request
more outstanding I/O than the OS can provide on a per process basis, it will switch that
process of the CPU. The idea behind asynchronous I/O is that ASE can do other processing
while I/O completes. ASE will detect the limit and simply put the process to sleep, carry on,
and schedule other processes.

Device Activity Detail


Device:
/PUK21SQL_tempdb/PUK21SQL_tempdb
PUK21SQL_tempdb per sec per xact count % of
total
------------------------- ------------ ------------ ---------- ------
----
Reads
APF 0.0 0.0 0
0.0 %
Non-APF 10.7 80.0 640
50.6 %
Writes 10.4 78.1 625
49.4 %
------------------------- ------------ ------------ ---------- ------
----
Total I/Os 21.1 158.1 1265
1.1 %

Device Semaphore Granted 21.1 158.1 1265


100.0 %
Device Semaphore Waited 0.0 0.0 0
0.0 %

-------------------------------------------------------------------------
----

sp_sysmon outputs the above for every disk device in sysdevices.

ASE keeps track of every device used for I/O by allocating a descriptor for each. The
descriptors are memory structures containing information for reading and writing to the
device.

Semaphores are a locking mechanism to prevent two processes from updating the same
information simultaneously. This is not the locking ASE uses on pages; this locking is on
memory structures that keep track of the multiprocessing occurring in ASE. When ASE
performs I/O, the task has to fill in a block I/O structure. The semaphore prevents two
processes trying to send I/Os to the same device simultaneously.

• Reads - APF - This is the cumulative value for every read request that occurred on
this device during the sampling period. This only includes the 2K to 128K reads that
were issued by the asynchronous prefetch algorithm.
• Reads - Non-APF - This is the cumulative value for every read request that occurred
on this device during the sampling period. This includes 2K to 128K reads issued by
each user process.
• Writes - This is the cumulative value for every write request that occurred on this
device during the sampling period. This includes 2K to 128K writes issued by each
user process and by processes such as the housekeeper and checkpoint process.
• Device Semaphore Granted - The number of I/Os that did not have to wait for the
semaphore.
• Device Semaphore Waited - The number of I/Os that were held up due to another
process acquiring the semaphore.

Device Activity Detail Tips

ASE allows a database to have up to 128 fragments, as represented by a row in sysusages. If


each device was 4 GB, then the database can only be a maximum of 512 GB in size. If you
require a larger database, then you need to create larger devices. The problem with this is that
each device has only one semaphore, and if a device is, say, 32 GB, unless you use segments,
you could have all the activity going to the same device, causing semaphore contention.
The trick here is to compromise. If you are getting semaphore waits, then you either create
segments to distribute the I/O or recreate your devices smaller. The chances are if you have
large databases, you will have to create a combination of segments and varying disk devices.

Chapter 4: Indexes
Why Use Indexes?

The default mechanism for searching through data in the Adaptive Server is a table scan. This
means that the table is treated like a list and searched from top to bottom. This is a perfectly
reasonable method to find information when the table is very small, but as the table grows, it
becomes increasingly inefficient because scans are, by nature, an indirect way of accessing
data rows. If, for example, a table has 3,000,000 entries and a query is performed to retrieve
the 2,999,990th row, a user must wait until the server finishes reading the first 2,999,989 rows
of data. Depending on the speed of the server, this can be a very lengthy affair.

Indexes speed up data retrieval because they enable direct access to data rows. Table indexes
function like the indexes of a book: Information is organized based on a criteria specified
when the index is created and pointers are built to the data.

There are two key reasons to use indexes in an Adaptive Server database:

• To maintain uniqueness
• To provide faster access to data

An index can be used to enforce uniqueness in the column(s) that comprise the index's key.
For a unique index, ASE will ensure that there are no duplicate values stored in the index's
key column(s) and, if relevant, only one row has null value(s). All primary key constraints
include the creation of a unique index. However, the primary reason indexes are created is to
improve query performance.

While an index is not required for accessing data, it may be needed to access data efficiently.
If a table has no indexes, Adaptive Server will be forced to retrieve rows via a table scan. On
a small table, this may not be terribly noticeable. However, on large tables, the scan can create
serious delays for results on the database.

Index Usage Criteria

While it is possible to build an index that contains all of a table's columns, it is usually not
advantageous to do so, though there are exceptions. What is crucial, however, is that the index
must be usable by the applications' queries. In order for ASE to consider using an index, the
query must contain a search argument (SARG) or a join clause that matches one or more
columns in the index, including, but not limited to, the first column in the index.

Example:

create index idx1 on employee (division, dept, empl_type)

Each of the following queries might be able to use this index:


select * from employee
where division = 'accounting'
and dept = 'finance'

select * from employee


where division = 'accounting'
and empl_type = 'exempt'

select * from employee


where division = 'accounting'

However, the following query cannot:

select * from employee


where empl_type = 'exempt'

Indexes and Performance

While indexes are instrumental in improving the performance of queries, they can also slow
the performance of inserts, deletes, and updates. Every index built on a table must be
maintained, so as a table is modified, the indexes must also be updated to reflect the changes.
This not only means that an update, insert, or delete will take longer to complete because of
the additional overhead, but it also can have a pronounced impact on concurrency and your
application's throughput, since index page updates are a primary cause of process blocking.
Consequently, a review of the table's activity profile should be part of any index design
process. Always keep the number of indexes on frequently modified tables to the absolute
minimum in an OLTP (online transaction processing) environment, where tables may be
modified thousands of times a second. The additional impact of having indexes may seriously
affect the overall performance of an application. Often, the only indexes present are those
required to support the update transactions and to enforce uniqueness for the tables.

If a table is modified infrequently, like the tables in a DSS (decision support system) database,
having too many indexes is not really a performance issue. In this case, the only issue will be
how much disk space you can afford to allocate to index creation, because in some cases the
space used by the indexes can significantly exceed the space used by the tables themselves.

Determining Index Usefulness

When a query is sent to ASE, the optimizers need to find the most efficient method of
retrieving the data. This often includes the use of an index, but how does the optimizer
determine which index?

The Adaptive Server evaluates all possible indexes that match the search arguments and join
clauses in the query. They are compared against each other and versus the cost of table scans.
The Adaptive Server chooses the plan estimated to produce the least number of I/Os. Within a
plan, the Adaptive Server can only use one index per table to satisfy a query unless a query
contains an OR clause. In that situation, the Adaptive Server can apply the OR strategy and
choose more than one index to solve the multiple search arguments suggested by the OR.
Data Distribution

In order for the Adaptive Server's optimizer to produce an estimate for various query plans, it
must have an idea of what data is already in the table. This is provided by the data distribution
statistics stored in the sysstatistics system table. Distribution statistics are created whenever an
index is created or the update statistics command is run on a populated table. If the Adaptive
Server has index statistics available, the server will estimate the number of rows matching the
predicate criteria using either the distribution steps recorded or the index density. If a column
or table has no statistics available to it, the Adaptive Server uses a series of default
percentages to derive row estimates:

• 10% for equality SARGs (=)


• 25% for closed range SARGs (BETWEEN, >, and <)
• 33% for open range SARGs (>,>=<,<=)

In some cases, the Adaptive Server will automatically know which index should be chosen
without the need for statistics. For instance, if there is an equality join unique index, the server
knows only one row will match and does not need to use statistics.

Index Statistics

The index statistics are stored within the sysstatistics and systabstats system tables. In earlier
versions of the Adaptive Server, the statistics were stored in distribution pages in the
sysindexes table.

For indexes, the Adaptive Server maintains the number of pages and rows in the table, the
height of the index, the number of leaf pages at the end of the index, and the average leaf row
size.

For column data, the Adaptive Server keeps a histogram for the leading column of the index.
These histograms are used to determine the selectivity of a search argument (SARG), which is
an estimate of how many rows in the table should match a particular predicate value. Density,
which is the distribution of keys in the index, is also stored in the statistics. There are also
cluster ratios that measure how fragmented the data has become. This is useful in determining
whether a large I/O (> 2K I/O) can be used.

Only a subset of these statistics (the number of leaf pages, for example) is maintained during
query processing. Other statistics are updated only when you run update statistics or when you
drop and recreate the index. Since the statistics information is stored in a table, it can be
manipulated as part of your performance and tuning work using the optdiag utility program
bundled with the Adaptive Server starting in version 11.9. This command-line utility enables
you to:

• Review a table's statistics.


• Determine when statistics were last updated.
• Extract statistics and load into another environment.
• Generate simulated statistics for use in performance tuning.
Updating Statistics 11.9 and Higher

The Adaptive Server creates statistics whenever an index is created or the update statistics
command is run on a populated table. However, once created, statistics are not automatically
kept up to date. If an index is created for a table that has 1,000 rows, for example, the
statistics will accurately reflect the table's data distribution at index creation time. Given time
and modifications, however, the statistics will become increasingly inaccurate. If the same
table has been modified to the point that it has more than 1,000,000 rows, it is conceivable
that the statistics may still only reflect the initial 1,000 rows. This can cause the optimizer to
make very dangerous choices when formulating query plans. For this reason, it is essential
that you include the update statistics command in your standard maintenance routine. The
update statistics command updates information about the distribution of key values in
specified indexes/columns.

There are actually three versions of the update statistics command:

• update statistics
• update index statistics
• update all statistics

The syntax for the commands are as follows:

update statistics table_name


[ [ index_name] | [( column_list ) ] ]
[using step values ]
[with consumers = consumers ]

update index statistics table_name [ index_name]


[using step values ]
[with consumers = consumers ]

update all statistics table_name

For update statistics:

• table_name - Generates statistics for the leading column in each index on the table.
• table_name index_name - Generates statistics for all columns of the index.
• table_name (column_name) - Generates statistics for only this column.
• table_name (column_name, column_name...) - Generates a histogram for the
leading column in the set and multicolumn density values for the prefix subsets.

For update index statistics:

• table_name - Generates statistics for all columns in all indexes on the table.
• table_name index_name - Generates statistics for all columns in this index.

For update all statistics:

• table_name - Generates statistics for all columns on the table.


Column Statistics

The Adaptive Server maintains histograms of the column data for the purpose of determining
the data distribution of the column. These histograms are created whenever a create index
command is run on a populated table or if statistics are created for a particular column via the
update statistics command. They are updated whenever an update statistics, update index
statistics, or create index command is run. In releases prior to 11.9, data distribution was kept
on a per-index basis. The value of keeping the data histograms on a per-column basis is that
the same histogram can be used for other indexes that share the same column. This also means
that when the statistics are updated for one index, that column will be updated for all the
indexes that share the column.

The update statistics, update index statistics, and create index commands update the histogram
for the column and the density statistics for all prefix subsets while the update all statistics
command updates histograms for all columns in a table.

Statistics are not automatically dropped in the Adaptive Server. This is in direct contrast to
versions prior to 11.9. Dropping an index does not drop the statistics for the index, since the
optimizer can use column-level statistics to estimate costs, even when no index exists. In
order to remove the statistics for a column, even after dropping an index, the statistics must be
dropped explicitly with the delete statistics command.

In the same light, truncating a table does not delete the column-level statistics in sysstatistics.
If a table is reloaded after a truncate table, the statistics for the table will reflect data
distribution in the table prior to truncation. Consequently, the distribution values may be
significantly different from the actual data in the table. In order to correct this, the update
statistics command should be run whenever a table's data is refreshed.

There is some usefulness to having statistics that do not drop. For instance, if the data in a
table is seriously skewed, the statistics may be manually altered via the optdiag program
(again, this is only recommended for really experienced administrators). If an index needs to
be rebuilt, it can be recreated without affecting the pre-existing index statistics by specifying 0
as the value for the number of steps in the create index command's statistics clause:

create index title_id_ix on titles(title_id) with statistics using 0 values

Create/Update Column Statistics

Beginning with ASE 11.9, the ability to create statistics on unindexed columns was added.
Having statistics on unindexed columns can also improve performance. The optimizer can use
statistics on any column in a WHERE or HAVING clause to help estimate the number of
rows from a table that match the complete set of query clauses on that table. Adding statistics
for the minor columns of indexes and for unindexed columns that are frequently used in
search arguments can greatly improve the optimizer's estimates.

Maintaining many indexes during data modification is expensive. Generating statistics for a
column without creating an index gives the optimizer more information to use for estimating
the number of pages to be read by a query, without entailing the processing expense of index
updates during data modification. The optimizer can apply statistics for any columns used in a
search argument of a WHERE or HAVING clause and for any column named in a JOIN
clause. It needs to be determined whether the expense of creating and maintaining the
statistics on these columns is worth the improvement in query optimization.

The following commands create and maintain column statistics:

• update statistics - When used with the name of a column, this command generates
statistics for that column without creating an index on it.
• update index statistics - When used with an index name, creates or updates statistics
for all columns in an index. If used with a table name, it updates statistics for all
indexed columns.
• update all statistics - Creates or updates statistics for all columns in a table.

Good candidates for column statistics are:

• Columns frequently used as search arguments in WHERE and HAVING clauses.


• Columns included in a composite index. These are not the leading columns in the
index, but they can help estimate the number of data rows that need to be returned by a
query.

When to Use Additional Statistics

To determine when additional statistics are useful, run queries using dbcc traceon(302) and
statistics io. Look for significant discrepancies between the 'rows to be returned' and I/O
estimates displayed by dbcc traceon(302) and the actual I/O displayed by statistics io. Look
especially for the use of default density values for search arguments and join columns.

Adding Statistics for a Column

To add statistics for the price column in the titles table:

update statistics titles (price)

Specify the number of histogram steps for a column:

update statistics titles (price) using 50 values

Add a histogram for the titles.pub_id column and generate density values for the prefix
subsets pub_id; pub_id, pubdate; and pub_id, pubdate, title_id:

update statistics titles(pub_id, pubdate, title_id)

Adding Statistics for Minor Columns

To create or update statistics on all columns in an index, use the update index statistics
command. The syntax is:

update index statistics table_name [ index_name]


[using step values]
[with consumers = consumers ]

To create or update statistics for a single column on a table, the syntax is:
update statistics table_name(column_name)
[using step values]
[with consumers = consumers ]

To create or update statistics on all columns in a table, use the update all statistics command:

update all statistics table_name

Choosing Step Numbers for Histograms

By default, each histogram has 20 steps that provide good performance and modeling for
columns that have an even distribution of values. However, there are situations when a higher
number of steps can increase the accuracy of I/O estimates for certain types of columns:

• Columns with a large number of highly duplicated values


• Columns with unequal or skewed distribution of values
• Columns that are queried using leading wildcards in like queries

There are advantages and disadvantages to choosing a number beyond the default. An obvious
advantage is that it provides the optimizer with a more granular, accurate estimate of data
distribution. However, increasing the number of steps beyond what is needed for good query
optimization can actually hurt Adaptive Server performance. This is mostly due to the amount
of space that is required to store and use the statistics, since increasing the number of steps
will increase the space requirements for the sysstatistics table. This also leads to an increase in
the amount of cache needed to read the statistics during query optimization and, potentially,
the amount of I/O required to read the statistics into cache. Note that the space used to store
the statistics during optimization is borrowed from the procedure and not the data cache.

Using the delete statistics Command

Deleting indexes has no effect on the statistics stored for the columns in that index. If the
column statistics are no longer needed, then the delete statistics command allows you to drop
statistics for specific columns.

The syntax for deleting statistics is:

delete statistics tablename(column_name)

The following example deletes the statistics for the price column in the titles table:

delete statistics titles(price)

The delete statistics command, when used with only the table name, removes statistics for a
table, even where indexes exist. This will affect how queries are executed, since there will be
no statistics available. You must run update statistics on the table to restore the statistics for
the index.

When Row Counts May be Inaccurate

One of the problems that can occur with statistics is that they can become inaccurate. For
instance, in certain situations, the row count values for the number of rows, number of
forwarded rows, and number of deleted rows may be inaccurate, especially if query
processing includes many rollback commands. If workloads are extremely heavy, and the
housekeeper task does not run often, these statistics are more likely to be inaccurate. Running
the update statistics command corrects these counts in systabstats. Running dbcc checktable
or dbcc checkdb updates these values in memory. When the housekeeper task runs, or when
you execute sp_flushstats or optdiag against a table, these values are saved in systabstats.

Note The configuration parameter housekeeper free write percent must be set to 1 or greater to
enable housekeeper statistics flushing.

Composite Indexes

A composite index is an index with more than one column in its key. Because it is possible to
create multiple indexes as long as the keys are different, it is also possible to create indexes
that effectively negate the need for others.

Consider the following index and queries:

create index idx1 on


employee (division, department, emp_num)

select * from employee where division = 'abc'


and department = 123
and emp_num = '123-456-789'

select * from employee where division = 'abc'


and emp_num = '123-456-789'

select * from employee where department = 123


and emp_num = '123-456-789'

Question: Which queries could use this index?

Answer: The first two queries can potentially use the index created. A composite index can
be considered as long as one of the search arguments in the query matches the primary
column in the index key (in this case, the division column). Also note that with this index, it is
not really necessary to have an index exclusively for the division column. For the most part,
the idx1 index described should already be enough. This index can serve triple duty on
queries that have search arguments on 'division,' 'division and department,' or 'division,
department and emp_num.' Another index just for division would be redundant and produce
more overhead for the table.

Composite vs. Many Indexes

Consider this example:

select pub_id, title, notes from titles


where type = 'Computer'
and price > $15.
The possible index combinations for the table are:

• CI or NCI on type
• CI or NCI on price
• One index on each type and price
• Composite on type, price
• Composite on price, type
• CI or NCI on pub_id, title, notes, type, price, or some combination thereof

The choice is to make only composite indexes, individual indexes, or some combination of
the two. The choices must be made carefully because each additional index can negatively
impact performance. In the end, the choice will be based on which of the indexes are usable
by the optimizer. This will be determined by the distribution of the data, the cardinality of the
index columns, and the predicates used in the SQL queries.

Clustered Indexes

In tables using allpages locking, a clustered index ensures that the physical order of rows
stored in the data pages is the same as the indexed order of the rows. In tables using
datapages/datarows locking, this is no longer true. Sybase states that the index is used 'to
direct the storage of data on rows and pages, but strict key ordering is not maintained.'

So while the efficacy of a clustered index is influenced by the table's locking strategy, there
are still clear advantages to defining a clustered index on most tables. The following types of
queries will benefit from a clustered index:

• Range queries
• Queries that retrieve a significant percentage of the table's rows
• Queries that require additional sorting due to order and group by
• Queries that would benefit from a merge-join strategy instead of nested loop

For allpages locking tables, another benefit of having a clustered index is that you can avoid
hotspot contention during inserts. Tables lacking a clustered index will always insert rows on
the current active page, the last page in the table's page chain. Since these processes lock at
the page level, this means that only one process can perform an insert at any point in time. In
effect, you now have a single-threaded application. Congratulations! How's your resume
looking?

Finally, a common misconception is that a table's clustered index should always be built on
the primary key. This may be correct if the primary key's columns are always used as your
predicates, but in other situations, it can become a dangerous presumption. Again, study your
application and review the code before defining any indexes on your tables.

Note, however, that hot spot contention can occur, even if you have a clustered index, if it is
built on a column whose values increase sequentially - commonly referred to as a monotonic
key. For example, if your clustered index is built on an ID value that increases sequentially by
a fixed value like an identity column, then you have a monotonic key and your inserts will
always compete with each other for access to the current active page. When selecting your
index, make sure that the key values distribute your data.
Non-Clustered Indexes

Since a given table can contain only one clustered index, all other indexes created on a table
must be defined as non-clustered.

For an OLTP environment, it is a good idea to keep the number of non-clustered indexes to a
minimum, as each additional index increases the overhead for update, delete, and insert
operations. This is not an issue with DSS type applications or EIS (Executive Information
System) type applications, since these do not generate high update volumes.

Generally, if an index key has a 'high index density' (low number of unique values), it is
considered a bad choice for a non-clustered index. Non-clustered indexes tend to be most
effective when less than 20% of the table's data is being retrieved, and they are best suited for
supporting the following types of access:

• Single-column, single-function aggregation (sum, min, max)


• Covered queries (matching or non-matching scans)
• Single-row retrieval

A Comparison between Clustered and Non-Clustered

In the following example, we will examine a possible indexing scenario. Given the titles table
with some 2,000,000 records, we will try to determine which index will provide the best
performance for the query: a clustered index on the price column, a non-clustered index on
the price column, or a table scan.

The query:

select title from titles


where price between $5. and $10.

Given a look at the data in the table, we have also determined that the amount of data with a
price between $5 and $10 is 1,000,000 rows. Let us also assume that a look at the rest of the
columns allows us to determine that the table can fit 40 rows per page.
Clustered Index I/O Cost

With a clustered index on the price column, all the data in the table will be sorted by price.
Once the first valid price is found, the remaining eligible rows can be retrieved by scanning
the table until the next invalid price is found. So the index is really used to find the starting
point for a scan rather than scanning the table from beginning to end.

It has already been determined that the number of rows to be returned will be 1,000,000.
Using the value of 40 rows per page, we can calculate the minimum number of pages that
need to be returned.

1,000,000 rows / 40 rows per page = 25,000 pages

The actual number of I/Os performed to retrieve this data will be determined by how
fragmented the data is, as well as whether or not large I/O buffer pools are available.

Non-Clustered Index I/O Cost

Since a non-clustered index does not guarantee that the data is physically stored in index key
sequence, it is possible that your process will wind up reading the same data pages multiple
times. In a worst-case scenario, each row returned would require a separate I/O though the
type of I/O, logical or physical, will be a function of cache/buffer pool size and turnover. This
means that retrieving this data using a non-clustered index could require at least 1,000,000
I/Os for the result since additional I/Os are required to traverse the index page chain.

Table Scan I/O Cost Comparison

The final alternative, a table scan, would yield:

2,000,000 / 40 rows per page = 50,000 pages

This means that the table scan could cost 50,000 I/Os.

The final analysis is:


Clustered index 25,000 I/Os
Non-clustered index 1,000,000+ I/Os
Table scan 50,000 I/Os

In this case, we discover that it is quite possible that the choice of a non-clustered index on the
query may actually be less efficient than a table scan. This means that given the choice
between a non-clustered index and a table scan, the optimizer may choose a scan, and wisely
so.

Index Covering

One way to maximize the performance benefits of a non-clustered index is the covered index
query. A covered query is when all the columns in the predicate and the result set are
contained in the index. The index can contain additional columns, but it must contain, at the
minimum, all the columns required by the query. Given this condition, the index will not be
required to read the actual data pages, since all required data is in the index itself. This is very
beneficial because not only do we avoid incurring the I/O required to read the data page, but
the I/O against an index page is more efficient since an index page will typically contain
significantly more rows than a data page. Index covering is especially useful with aggregates
when there are no search arguments in the query.

When designing non-clustered indexes, it is a good idea to consider index covering. The
additional column to the index may not take much space, but it may produce an enormous
performance advantage.

On the opposite end of the spectrum, it is also good to avoid making the indexes too wide.
Part of the benefit of index covering is that the key column in the index is significantly
smaller, allowing more results to be produced with less I/O. The closer an index comes to the
full width of the data table, the less useful the index. The number of index levels will increase,
making the index larger and less efficient, and the scan time will begin to approach real table
scan time.

Note Beware of making the index too wide, and remember that changes to data will cascade into
indexes.

Indexing for Multiple Queries

The examples that have been presented in this chapter have one fatal flaw when applied
against a production database: They may not be the only queries applied to the tables in the
database. There may be other queries that use different predicates when accessing these
tables. How does this affect the choice of index design?

Consider the following two queries:

select title from titles


where title = "Alleviating VDT Eye Strain"

select title from titles


where price between $5. and $10.
It would appear that these queries could be supported by either of the following index
combinations:

• NCI on titles (title) and CI on titles (price)


• CI on titles (title) and NCI on titles (price)

Which is the best choice? Answering this question requires an understanding of the entire
application. While you tune individual queries, you design indexes to support the entire
application. When determining whether to build an index to speed up a query, always
consider:

• Predicates commonly used in application's code.


• How frequently the specific query is executed.
• How critical the query is.

Indexing for OR Clauses


select title from titles
where price between $5. and $10. or
type = 'computing'

The Adaptive Server can optimize an OR clause in different ways, depending on the indexing
present, the uniqueness of the result sets, and the cardinality of the data involved. Accessing a
single table using an OR clause is equivalent to performing a union of two separate queries.
The query above, for example, can be re-expressed as follows:

select title from titles


where price between $5. and $10.
union
select title from titles
type = 'computing'

Because of this, a query containing an OR clause is the only query that can be resolved using
two indexes. In the above example, if indexes are defined on the price and type columns, both
can be used. If, however, either of the two predicate columns above (price and type) is not
indexed, then the Adaptive Server will scan the table, even if the other column is indexed. It is
also possible to have the optimizer decide to scan the table, even if both predicate columns are
indexed. This is because the optimizer will calculate the I/O cost for index-based access for
each part of the OR clause, as well as any required sorting, and compares it to the cost of a
table scan. If the optimizer determines a scan requires less I/O, it will choose that access
method, even if the columns are indexed. This decision is typically tied to the cardinality,
hence the selectivity, of the columns being used as predicates, as well as how current your
statistics are.

When an OR clause is used with a single table access, it is possible that there are rows in the
table that match both criteria. Using the above example, there could be titles whose price is
between $5 and $10 and whose type is 'computing.' In this case, the OR strategy, also referred
to as using a dynamic index, is chosen by the Adaptive Server. The indexes are not used to
directly return any query results. Instead, they are used to retrieve row IDs which are inserted
into a work table and sorted to remove duplicates. Again, both predicate columns must be
indexed, or a scan is done.
Special OR Strategy (Multiple Matching Index Scans)

For queries that use the OR clause on the same column with different arguments, there is a
special strategy used to resolve the query. Since the queries will refer to the same column,
there should be no need to worry about repeating row IDs. The Adaptive Server will use
multiple matching index scans to resolve the query. In other words, the Adaptive Server will
use the same index multiple times.

Example:

select * from authors


where au_fname in ("Fred", "Sally")

In the preceding example, the query has a search argument on au_fname but requires a match
for either of two values. The query would be converted by the optimizer to:

select * from authors


where au_fname = "Fred"
or au_fname = "Sally"

If an index exists on au_fname, the Adaptive Server will scan through the index once to find
values for 'Fred' and once more to find values for 'Sally.' Since there is no possibility of an
author being 'Fred' and 'Sally,' there is no need for a work table.

Summary

The optimizer decides whether or not to use an index. The optimizer will evaluate alternative
access methods and determine which solution is the least expensive in regard to I/O. Often,
the solution will involve using an index. Clustered indexes will work well for queries that
retrieve data in a range. They can be used to help avoid the added expense of sorting data,
since the data in the table will already be sorted.

Non-clustered indexes work well for locating single rows or for lookups of limited rows of
data. They can be used to help ease the overhead of some types of sorting operations and are
usually less efficient when compared to clustered indexes. However, if the index is designed
as a covering index, the non-clustered index can actually perform faster than a clustered
index. Since the usage of the indexes is dependent on the analysis of the optimizer, and the
optimizer is dependent on the statistics stored in the database, it is important to keep the
statistics up to date. Make sure your maintenance schedule always includes time to update
statistics.

Chapter 5: The Optimizer Statistics


What are the Statistics?

The optimizer is cost-based. That is, it estimates the cost of a given method of accessing the
required data and compares that to the lowest cost access it has found to that point. If the
access is cheap, it marks it as the best option so far and moves on to cost the next access. If
it's more expensive than the cheapest option, it rejects it and moves on.
To estimate the cost of an access, the optimizer uses a number of algorithms and formulas. It
also needs information to use in these algorithms and formulas in order to arrive at a
selectivity value. The information needed by the optimizer is supplied by the statistics.

There are two forms of statistics stored for use by the optimizer: the table/index level statistics
and the column level statistics. Each is critical to the optimizer.

Terms and Definitions

Query plan - This is the set of instructions describing how the query will be executed. This is
the optimizer's final decision on how to access the data.

Costing - This is the process the optimizer goes through to estimate the cost of each query
plan it examines.

optdiag

In this chapter, we'll be looking at the various statistical values by using optdiag output.

optdiag was developed to allow you to easily read, modify, and simulate the statistics. It's a
command-line utility that is external to ASE.

You can query and write to the two statistics system tables directly. However, it's rather
difficult to deal with the numerous varbinary values stored in the tables. Not to mention that
anything you write directly to systabstats will be overwritten quickly.

You can output and input optdiag files. The output files can be edited with any text editor.
Reading in an edited optdiag file will write the changes that you've made to sysstatistics (the
column level statistic). You cannot use optdiag to write the table/index statistics in systabstats.
If you do, it will raise an error.

optdiag output is an important and vital tool whenever you need to either diagnose an
optimizer issue or examine how your data is distributed.

See the Sybase ASE documentation, specifically the Performance and Tuning Guide, for
details on the syntax and general uses of optdiag. At the end of this chapter is some detailed
material on optdiag simulate mode.

Types of Statistics

The optimizer uses two types of statistics to estimate the cost of an access: table/index level
and column level. Each statistic type is stored in its own system table and is critical to the
optimizer's cost estimations.

Table/Index Level Statistics

The table/index level statistics describe the table and its indexes to the optimizer. The
optimizer must know about the objects in order to estimate the cost of various access
methods. The cost of a table scan is used to compare to all possible index access methods. To
get this cost, the optimizer uses the number of rows and pages in the table. The number of
index rows, index pages, and, in the case of a non-clustered or DOL (data-only locked)
clustered index, the data row cluster ratio are used to estimate the cost of using an index. The
data page and index page cluster ratios are used to estimate the cost of using a large I/O
access.

The table/index level statistics are stored in systabstats. Some of the values are maintained
dynamically in memory as changes occur to the data. They are page count, row count, deleted
rows, forwarded rows, and the CR counts, which are used to compute the cluster ratios. These
values are stored in a memory structure and updated there. As the data changes, the values
also change.

The optimizer will read the values in memory and not those stored in systabstats. If the values
are not in memory, the optimizer will read them into memory. The values in memory are
written (flushed) to systabstats in a number of ways - update statistics, running optdiag, a
clean shutdown, running sp_flushstats, and most commonly by housekeeper as part of its
normal routine. The values in systabstats can vary from those in memory until they are
flushed to the table.

Once the values are in memory, they will stay there until the memory structure is needed by
another process within the dataserver.

Let's take a look at some of the various table/index level values.

Statistics for table: "lineitem"


Statistics for index: "c1" (clustered)
Index column list: "c"

If this is an APL (allpages locked) table and there's a clustered index, the values will be
labeled for the clustered index. The table's name will appear above the clustered index name
and column list. If this is a clustered index on a DOL table, the clustered index will be listed
the same as a non-clustered index.

Data page count: 44308


Empty data page count: 0
Data row count: 600572.0000000000000000
Forwarded row count: 0.0000000000000000
Deleted row count: 0.0000000000000000
Data page CR count: 5539.0000000000000000
OAM + allocation page count: 181
First extent data pages: 0
Data row size: 133.4573373384040451
Derived statistics:
Data page cluster ratio: 1.0000000000000000
Space utilization: 0.9035689867593812
Large I/O efficiency: 1.0000000000000000

Index:
Statistics for index: "lineitem_sdate" (nonclustered)
Index column list: "l_shipdate"
Leaf count: 1822
Empty leaf page count: 0
Data page CR count: 569926.0000000000000000
Index page CR count: 231.0000000000000000
Data row CR count: 583396.0000000000000000
First extent leaf pages: 0
Leaf row size: 6.0497142723936514
Index height: 2
Derived statistics:
Data page cluster ratio: 0.0233102653424636
Index page cluster ratio: 0.9981179422835633
Data row cluster ratio: 0.0308774251075029
Space utilization: 0.9960645830568992
Large I/O efficiency: 1.0000000000000000
Data page count: 44308

This is the number of data pages in the table. The optimizer uses this value to establish the
cost of performing a table scan. This cost is the 'base cost' and is compared to all other access
methods costed by the optimizer. This value is fundamental to cost-based optimization.

Leaf count: 1822

This is the number of pages in the index's leaf level if it's a non-clustered index or a clustered
index on a DOL table.

Empty data page count: 0


Empty leaf page count 0

Empty pages are generally a space management issue. However, as they increase, the
clustering decreases, making large I/Os less efficient. The more rows on a page, the more that
can be read in a single I/O.

Data row count: 600572.0000000000000000

This is the number of rows in the table. This value will be used to set an average number of
rows per page. It is a fundamental value in the optimizer's costing.

Forwarded row count: 0.0000000000000000

Forwarded rows only exist on DOL tables. They occur when a row has been moved off of its
original page to another page. The optimizer must cost each forwarded row as two I/Os - one
to read the original page and follow the pointer and another to read the row's current page.

Deleted row count: 0.0000000000000000


This is the number of rows that have been marked as deleted but whose space has not been
compacted by reorg or taken by another row. You can use this one to track fragmentation of
the table. However, this is not only an issue of space management. A page has to be read to
get the required rows. The more of them that are on the page, the more efficient the read will
be.

Data page CR count: 5539.0000000000000000

We'll talk about all three of the CR counts later in this chapter.

The Cluster Ratios

Derived statistics:

All values in the derived statistics areas are computed values and not actually stored
anywhere. They are computed from the CR counts (see the following section).

Let's take a look at the various derived values and what they're used for, what they mean, and
where they come from. Remember that a derived value of 1 is perfectly clustered. The further
from 1, the less clustered.

Data page cluster ratio: 0.2353102653424636

This value measures how well clustered data pages are in relation to their extents. If pages can
be read sequentially without jumping back and forth among extents, the data pages are
perfectly clustered. As things get more and more unclustered, more jumps between extents
must be done to read pages sequentially. In this example, the DPCR (data page cluster ratio)
is low and indicates some fragmentation.

On an APL table, there is only one DPCR value for the table. For a DOL table, there may be
two DPCR values, one for the table and one for the clustered index, if there is one. For both
table types, the table's DPCR can be used to estimate how much fragmentation there is in the
table.

Index page cluster ratio: 0.9981179422835633

This value is the same as the DPCR, except it's measuring the clustering of index pages.

The DPCR and IPCR (index page cluster ratio) are used to estimate the cost of using a large
I/O. If clustering is low, a large I/O will be less efficient because it is likely that more than
one extent will need to be read. For example, the DPCR above is very low. It indicates that
there would be many jumps between extents to read these pages. It's very unlikely that a large
I/O would be used.

Data row cluster ratio: 0.0308774251075029

The DRCR (data row cluster ratio) is used differently than the other two cluster ratios. It
measures how well clustered data rows are on data pages in relation to rows in the leaf level
of the index. Put another way, it measures how many jumps between data pages will have to
be done to read the rows in the order they appear in the leaf.

If you have a clustered index on an APL table, there will be no DRCR reported since the leaf
of the index is the data pages. For most non-clustered indexes, and in some cases clustered
indexes on DOL tables, the value is usually low. The only time when this is not the case is
when the index's leading column is the same as the clustered index's.

This value is used to help estimate the efficiency of a non-clustered index. In previous
versions of ASE, the costing of a non-clustered index access was 'pessimistic.' It was assumed
that each read of a data row from the leaf of a non-clustered index would cost one I/O. This
was assumed no matter how many rows on the data page qualified. There was no way to
measure how well rows were clustered on data pages in relation to the leaf of an index. This is
why it was common to see an efficient non-clustered index rejected in pre-11.9.2 ASEs.

Space utilization: 0.9960645830568992

This is an estimate of how well space in the pages of the table or index is being used. The
average row size and the number of rows are used to estimate the minimum number of pages.
This value is then compared to the current page count to get the space utilization value. This
is a useful value to use in order to get a quick idea of how fragmented the object is.

Large I/O efficiency: 1.0000000000000000

This value estimates the number of useful pages that will be read in a large I/O. You can think
of this as a cluster ratio for a large I/O. As with the other values in this section, the closer to 1,
the better. If you only have pools of 2K, this value will always be 1. Thus, it won't be useful
unless you have larger I/O sizes available. The larger the I/O, the more pages that can fit into
the large I/O read and the more that is being measured by this value.

Space utilization and Large I/O efficiency measurements first appeared in ASE 12.0 optdiag.
Neither of these values is used by the optimizer; they are there for you to use to help estimate
table fragmentation and how efficient a large I/O will be.

The CR Counts
Data page CR count: 569926.0000000000000000
Index page CR count: 231.0000000000000000
Data row CR count: 583396.0000000000000000

The CR count values are the 'raw' numbers that are used to compute the derived values. They
are the actual number of jumps that were done either between extents or data pages. If you
want to track clustering without using optdiag, you can query systabstats. As the CR count
values increase, the derived cluster ratio values will decrease.
How Can I Read the Cluster Ratios without Going to optdiag?

The formula that is used to compute the derived cluster ratio values is proprietary. However,
you can get a very good sense for clustering by querying systabstats without having to go to
the command line and use optdiag.

Watch the CR count values. As they increase, the derived values will decrease, becoming less
clustered. If the CR counts decrease, clustering is better.

When a minimum of changes to the data are occurring, try this technique:

Run the query below, and sp_flushstats will copy the most recent table/index level statistics
from the in memory copy to systabstats. In most cases, you won't need to run sp_flushstats
because housekeeper will flush the values to systabstats as part of its regular work. This will
give you a baseline to use when monitoring the values. From time to time, rerun the script
below and track any changes.

Note If you want to get the table level CR counts, and either the table is DOL or APL and
doesn't have a clustered index, drop the clause 'and t.indid !=0.'
sp_flushstats table_name
go
select
i.name,
t.dpagecrcnt "DPCR Raw",
t.ipagecrcnt "IPCR Raw",
t.drowcrcnt "DRCR Raw"
from sysindexes i, systabstats t
where t.id = object_id("table_name")
and t.id = i.id
and t.indid = i.indid
and t.indid !=0
go

Column Level Statistics

Put simply, the column level statistics describe the distribution of values in the column. There
are two types of values used to do this - the histogram and the density values.

The column level statistics are stored in sysstatistics. Prior to 11.9.2, statistics describing the
data were stored on a single 'distribution page'; each index had only one. Thus, the space for
column statistics was limited. If a column was not the leading column (the first listed column
of an index), it could not have statistics. Without statistics, the optimizer had no choice but to
make hard-coded assumptions for such a column for any SARG or join it was involved in. It
was impossible for the optimizer to have a full and accurate picture of an index when it was
costing a query.

If an index was dropped, the statistics would be dropped also.

In 11.9.2 and above, the column level statistics are an attribute of a column and not an index.
This makes it possible to put statistics on any and all columns you like. In particular, it is
possible to add statistics to inner columns of a composite index or join columns that are not
part of an index. We will discuss more on adding statistics later.
In pre-11.9.2, the number of steps in the distribution page was determined by the size of the
datatype. You could not change the number of steps. With large datatypes, the histogram on
the distribution page could become less granular and less accurate. Since the space for
statistics was limited to one page, they weren't able to stay accurate as the number of rows in
the column grew. Since the density values were also stored on the same page as the
histogram, they had the potential to take space from the histogram, making it less accurate.

Since the column level statistics are now stored in sysstatistics, the space for statistics is
limited only by the size of the database.

You can specify the number of steps to use in creating the histogram, and you can modify any
of the column level statistics (those stored in sysstatistics).

Let's take a closer look at the values:

Statistics for column: "column_name"

This indicates the beginning of statistics for the column.

Last update of column statistics: Apr 5 2001 3:29:15:890PM

This is the last date and time that this column's statistics were modified. Modification can
happen via create index, update statistics, or inputting an optdiag file. This satisfies a long-
term feature request - 'How do I know when update statistics was last run?'

The Density Values


Range cell density: 0.0000516598992857
Total density: 0.0000516598992857

The density values describe the average number of duplicate values in the column. Each is
used in different ways by the optimizer.

The closer to 1 a density value is, the denser the column is. A good example is a gender
column; its density would be 0.5000 (or very close). The closer to 0, the closer it is to being
unique. Since the value's display does have a finite storage capability, it may read 0 but the
column might not be absolutely unique; however, it will be close.

The total density value is the average number of duplicates in the entire column. It is used by
the optimizer to estimate how many rows will be returned for each scan of the table for a join.
Let's say the table has 100,000 rows and a total density of 0.001; the estimate for rows to join
in this table will be 100. It's also used as the default selectivity value for an equi-SARG (col =
value) when the value is unknown. This usually happens when a local variable is declared and
used within a batch. We will discuss the unknown values and the default selectivity values
shortly.

The total density value can be disproportionately affected when there are highly duplicated
values in the column. When this happens, the optimizer may be pessimistic and estimate that
more rows in the column will qualify for the join than actually will. In turn, this can result in
less than optimal query plans. See the 'Adding, Modifying, and Deleting Column Level
Statistics' section of this chapter for more information on changing the density values.

The range cell density is a measure of the average number of duplicates in the column for
values that are not highly duplicated. It is used to help estimate the selectivity of a SARG
when the SARG value falls into a range cell (more on cell types in the section titled 'The
Histogram Values').

If the total and range cell density values vary at all, you know that there are some highly
duplicated values in the column.

The Default Selectivity Values


Range selectivity: default used (0.33)
In-between selectivity: default used (0.25)

There are times when the optimizer does not know the SARG value when it costs a query.
This usually happens when a local variable is declared, a value is selected into it, and it is
used in a batch query or a procedure.

declare @var1 int


select @var1 = 100
select_clause
from_clause
where column = @var1

In such a case, the optimizer has to use default values to estimate selectivity for the clause. If
there are no statistics on the column, the optimizer must use hard-coded values based on the
operator of the clause. For an equality, the value used for selectivity is 0.10 (10%). For an
open range SARG (<, <=, >, >=), it's 0.33 (33%), and for a closed range or between, it's 0.25
(25%).

If there are statistics on the column, the stored default selectivity values will be used. As
mentioned earlier, the total density value will be used for equality SARGs, and the range and
'in-between' default selectivity values are labeled. Each of these values can be modified via
optdiag. The range and in-between default selectivity values will not be overwritten when
update statistics runs, but the total density will be. If you have queries that may result in
unknown SARG values, consider changing the default selectivity values. I would suggest
adding a couple of zeroes to the right of the decimal point. Here's an example of how to edit
the optdiag output file to change the range selectivity:

Range selectivity: default used (0.33)


Range selectivity: 0.0033

Keep in mind that if you change the total density value to affect equality SARGs with
unknown values, you will also affect all joins of the column. Test it before modifying the total
density for this purpose. You'll only need to consider changing the total density if it is
relatively high, and this will only occur when there are highly duplicated values in the
column. The fewer the number of highly duplicated values, the closer the two numbers are; if
there are no highly duplicated values, they will be the same.
In previous versions of ASE, the equivalent of the range cell density was used as the default
selectivity for equality SARGs.

The Histogram

The histogram is an ordered set of values that represent the distribution of values in the
column. The histogram consists of cells and their weights.

The distribution page found in pre-11.9.2 versions of ASE also held a histogram, but it did not
contain as much detailed information about the distribution of values. The histogram in 11.9.2
and above is far more granular. You can read it and modify it via optdiag; in earlier versions
this was difficult at best. The histogram in pre-11.9.2 was a simple equi-height type. That is,
each step represented exactly the same number of rows. The histogram in 11.9.2 and above is
a 'pseudo equi-height.' That is, each step (cell) will represent close to the same number of
rows, unless there are highly duplicated values. So, if the column is unique, each cell will
represent the same number of rows, except for the first cell which represents the NULL values
(discussed later) and the last cell, which may contain fewer values.

This is a good place to discuss the difference between steps and cells and the various types of
cells now available.

Steps and Cells

Steps can be thought of as the values that are read from the column to establish the boundary
values (of the cells). Every x number of rows, update statistics will read a value and save it for
use as the boundary value. The x number of rows is determined by the number of rows in the
column and the number of requested steps (more on request steps soon).

Cells can be thought of as representing a certain percentage of values in the column. A cell
consists of all the values between the boundary value and one significant bit greater than the
previous boundary value. A cell is said to be inclusive of the upper boundary (the printed
boundary value) and exclusive of the lower boundary (the previous boundary value).

The steps are the points within the column where sample values are taken to establish the
boundary values. Cells are used by the optimizer to estimate the cost of SARGs. Since cells
represent a percentage of rows in the column, this value needs to be recorded. The 'weight'
seen in the histogram is the percentage of the rows of the column represented by that cell.

Let's take a closer look at the various components of the histogram:

Histogram for column: "column_name"

This line indicates the beginning of the column's histogram values.

Column datatype: integer

This is the column's datatype. It is useful when doing an analysis.

Requested step count: 20


Actual step count: 20

These values represent the number of steps you requested (or the default of 20) when you
created the index or ran update statistics and specified a number of steps to use. If you don't
specify a number of steps to use, one of two things will happen. If the column did not
previously have statistics on it, the default of 20 steps will be used. If the column does have
statistics on it, then the previous requested step count will be used. At the end of your create
index or update statistics command, add 'using x values,' where x is the number of steps you
want to specify.

There are times when you do not get the same number of steps that you requested. This
happens because a smaller number of cells are needed to represent the data than were asked
for. You'll often see this when the column has a high number of duplicate values.

The Histogram Values

Let's take a look at the histogram values themselves. The histogram values consist of the step
number (this is also the cell number), the weight, the operator, and the boundary value. As we
discussed earlier, there is a difference between steps and cells - steps are the values sampled
from the column to create the boundary value of a cell, and cells are representative of rows in
the column. In your 'real world' work, the operators are not as important as the weights. The
operators help indicate the type of cell. However, keeping an eye on the cell weight is far
easier when examining the histogram.

The First Cell

The first cell represents the NULL values in the column. If the weight of the first cell is
anything other than 0, the column contains NULLs and the weight of the cell is the percentage
of NULLs in the column. This is a major change from pre-11.9.2 handling of NULLs in the
statistics. In the old distribution page, NULL values were placed on steps beginning with the
first step and, depending on how many NULLs were there, continued to consume steps. The
more NULLs, the less space for steps containing actual values. This made the statistics less
and less accurate and, in many cases, had an adverse affect on the optimizer because NULLs
were affecting its cost estimates, even if they weren't involved in the query. Since NULLs are
now represented only by the first cell, they do not encroach on the space for cells of actual
values. They also will not be involved in costing, unless specified in the query.

Step Weight Value


1 0.06893241 <= 0
2 0.01010703 <= 6049
3 0.01010703 <= 12161
4 0.01010037 <= 18211

In the example above, the column contains just about 7% NULLs.

Cell Types

There are two basic types of cells - range cells and frequency count cells (FC). Range cells
represent more than one value, while frequency count cells represent only one value. A
frequency count cell is more accurate because the optimizer does not need to make any
estimates of how much of the cell (and the column) will be qualified by the SARG.

Here's an example of a range cell:

3 0.05268644 <= 2121


4 0.05265480 <= 3181

Cell 4 represents all values in the column that fall between 2122 and 3181. The rows this cell
represents occupy 5.265480% of the column. Thus, the weight is the percentage of the column
occupied by the rows of the cell.

There are two sub-types of frequency count cells - dense and sparse. A dense frequency count
cell occurs when the values of the column are contiguous (for example, 1, 2, 3, 4, 5). A dense
FC will look like a range cell in optdiag output. Only its weight will tell you it's a frequency
count cell.

10 0.05265813 <= 10549


11 0.15365647 <= 10550
12 0.05266812 <= 11590

Cell 11 is a dense frequency count cell.

The other type of frequency count cell is a sparse FC. A sparse FC will occur when the values
of the column are not contiguous (for example, 10, 20, 30, 40). They have a different
appearance in optdiag output.

1 0.00000000 < 0.0000000000000000


2 0.11044637 = 0.0000000000000000
3 0.00000000 < 10.0000000000000000
4 0.11040641 = 10.0000000000000000
5 0.00000000 < 20.0000000000000000
6 0.11114071 = 20.0000000000000000

Here, all cells are sparse FCs. Since the values are not sequential, a 'dummy' cell with a
weight of 0 and the same boundary value is used as the lower bound for the dense FC. This
represents the fact that there are no values between the two dense FC values. As you can see,
it's easy to recognize dense FCs when reading optdiag output. In the example above, the
values 0, 10, and 20 each occupy a little more than 11% of the column.

It is possible to get an FC for each distinct value in the column. This will happen on its own if
the number of distinct values in the column is low enough and the number of requested cells
is large enough. If the histogram contains all sparse FCs, the number of cells will be double
that of a histogram with all dense FCs.

FCs and Cell Width - Making FCs

As mentioned earlier, when the column level statistics are gathered by the create index or
update statistics commands, a sample value is read from the column every x rows and used as
the boundary value. The number of rows between samples (x) is determined by the number of
rows in the table and the number of requested steps.
As the rows are read, the values are tracked. If a value occupies 50% or more of the rows
within a cell, it's marked as a possible FC. If there are enough requested steps, the FC will be
created. It's possible that an FC may occur even though it represents less than the width of
other cells. Since FCs are the most accurate type of cell, we want to use them if at all possible.

Cell width is the number of rows in the column divided by the number of requested steps
minus 1. You can use this simple formula to help you determine how many steps to request if
you want more FCs.

Multi-Column Density Values

If the column is the leading column of a composite index, there will be multi-column density
values for the column. These will appear right after the histogram in column groups.

The multi-column density values are essentially the same as the single column density values,
except they apply to a subset of columns rather than a single column. They measure the
average number of duplicate values in the given subset.

When you join columns of a composite index in a query, the multi-column total density value
can be used. However, the columns have to be consecutive. That is, if you join column A and
B in the following example, the multi-column total density value for the column group A,B
will be used. If you were to join A and C, then each of their individual column total density
values, or a default selectivity value if there are no statistics on the column, will be used.

Take a look at the example below:

Statistics for column: "A"


Last update of column statistics: Jun 18 2001 1:52:46:110PM
Range cell density: 0.0189719134425181
Total density: 0.2412580561291586

Above are the column's density values; these are the values we discussed earlier. There's an
index consisting of columns A, B, and C (in that order). Since column A is the leading
column of the index, the multi-column density values belong to it. As you can see, column A
is rather dense - 25% means that there are only about four distinct values in the column.

Statistics for column group: "A," "B"


Last update of column statistics: Jun 18 2001 1:52:46:110PM
Range cell density: 0.0068680810117574
Total density: 0.0350818042482194

This is the first subset of columns. Now we're looking at the combined average number of
duplicates for the two columns. In this case, both values have fallen because column B is less
dense. You can tell that column B is still dense enough not to have a large effect on the multi-
column total density value. Its total density is 0.1428588182999980.

Statistics for column group: "A," "B," "C"


Last update of column statistics: Jun 18 2001 1:52:46:110PM

Range cell density: 0.0000009818306758


Total density: 0.0000019032028302

Column C is now included in the subset. Its total density is 0.00000832 - not very dense at all.
The multi-column densities for this subset have dropped dramatically.

If you join all three columns of this index, very few rows will be estimated to qualify for the
join. The bottom line here is that the more consecutive columns of a composite index you use
in a join, the more likely it is that the estimated number of rows will be low.

No Statistics List

At the end of the optdiag output is a list of columns that don't have statistics on them. This can
be very handy to determine where statistics may need to be created. Columns in the list will
have to use default selectivity values when the column is used in a SARG or join.

No statistics for remaining columns: "D"


(default values used) "E"
"F"
"G"

Understanding update statistics

You've used update statistics for a long time. In pre-11.9.2 ASE, it would update existing
statistics only for the leading column of an index or indexes. This was usually accomplished
by using the syntax:

update statistics table_name [index_name]

So what happens when update statistics runs in this case? Update statistics will scan the index
leaf and do two things. First, it will take a sample value from the leading column of the index
every x rows (we disussed this earlier) to use as a step or boundary value. As it reads all the
rows, it will gather the density values. It would also update the various table/index level
statistics. This is very straightforward.

ASE 11.9.2 introduced new update statistics functionality and syntax. As we discussed earlier,
column level statistics now belong to columns and not indexes. Thus, they can be placed on
any column whether or not it's the leading column of an index or not in an index at all. To do
this, update statistics had to be enhanced. The enhancements deal with ways to put statistics
on individual columns.

Let's take a look at the new enhancements, what they do, how to use them, and any issue
surrounding them.

• update statistics table_name [index_name]


This is the syntax that's been around for a very long time. It still updates statistics on
the leading column of the indexes of the table or the specified index. It reads the leaf
of the index and does not need to sort the values.

• update statistics table_name (column_name)

This will create new or update existing statistics on the specified column only. The
table will be scanned, and a sort will have to be done before the statistics are gathered.
The sort will be done in a work table in tempdb. Obviously, reading the values into the
work table, sorting it, and scanning it to gather the statistics will increase the required
I/O. It will also consume tempdb space for the work table. However, keep in mind that
having statistics on inner columns of composite indexes or on non-indexed columns in
the vast majority of cases will be an excellent trade-off when compared to the added
time to run update statistics.

• update index statistics table_name [index_name]

This new functionality will create or update statistics on all columns of all indexes on
the table or of the specified index. This is much the same as running update statistics
on each column of the index individually. Update index statistics will work one index
at a time, but you don't have to list each one. We'll go into the details on why it's a
good idea to test out statistics to all columns of an index in the 'Adding, Modifying,
and Deleting Column Level Statistics' section.

You need to keep in mind the added I/O and tempdb usage when running update index
statistics. Be prepared to allow more time and more tempdb usage.

• update all statistics table_name

This enhancement will create or update statistics on every column of the table. I
strongly advise that you stay away from this syntax for a number of reasons. First of
all, it can take a very long time to run. Imagine gathering statistics on every column of
a multi-million row table. Imagine if there were 60 columns in the table! Obviously,
your maintenance time will increase dramatically. It is unlikely that you will ever need
statistics on every column of a table.

Keep in mind that if the column that update statistics is running on is not the leading column
of an index, a sort of a work table in tempdb will have to be done. If the column is not in an
index, a table scan will have to be done. This should not scare you away from adding
statistics. As you'll see, the added cost of update statistics is far outweighed by the improved
query plans that will result.

While we're on the subject of update statistics, let's discuss an age-old 'rule of thumb' that has
become a legend but is not wholly accurate: 'Update statistics should be run whenever the
data changes by 5 to 10%.'

This is not necessarily right or wrong. In general, it's a fine guideline. How often to run
update statistics is completely dependent on changes to the distribution of values in your
dataset, the work being done, and the efficiency of the query plans being generated by the
optimizer. Some datasets, whose distribution changes regularly, may need to have statistics
updated very often and others less often. Some datasets may never need to have update
statistics run in order for the optimizer to generate the most efficient query plans. You need to
test and monitor performance in order to determine the optimal intervals in which to run
update statistics. You know your data, so don't be afraid to not run update statistics if it's not
needed. You may save yourself some maintenance time.

Adding, Modifying, and Deleting Column Level Statistics


Why Add, Modify, or Delete Statistics?

Adding, modifying, or deleting column level statistics can result in more efficient query plans
than in previous versions of ASE. By adding or modifying statistics, you provide the
optimizer with a great deal more information about your data than was previously possible.
By deleting statistics, you can restrict the information the optimizer has about the data; at
times, this too can be an efficient approach.

Before we continue, keep in mind that adding, modifying, or deleting statistics is not
absolutely necessary. However, it is highly recommended that you consider taking advantage
of the flexibility and power of the new statistics and test this functionality in your tuning. You
may find that it dramatically affects the efficiency of your query plans. On the other hand, you
may find that for your datasets, keeping the statistics as they were previously is the most
efficient way to go. This is by no means a 'black and white' area of ASE.

When you run any form of update statistics, the statistics are gathered based on the actual
values in the column. You can think of these as the 'real' or 'actual' statistics. In the majority
of cases, these values are just fine for your queries. There are times, though, when tweaking
these values may improve things for you.

Adding Column Level Statistics

Adding statistics to columns, indexed or non-indexed, is the most common way to take
advantage of the statistics-related functionality. As mentioned earlier, adding statistics will
give the optimizer more information about the column, whether it's part of an index or not.
Let's look at the effect of adding statistics to inner columns (minor attributes) of composite
indexes first.

With statistics only on the leading column of a composite index, the optimizer is limited to
information about that column only and must make assumptions about any other columns in
the index. The additional statistics on inner columns give the optimizer a complete view of the
index.

Here's an example using a simple query:

select * from test


where col1 > 200
and col3 = .40
and col2 <= 300

In the following example, statistics are only on col1, the leading column of the composite
index, which contains col1, col2, and col3. You can see that without statistics on col2 and
col3, the optimizer has to use default selectivity values to estimate the selectivity for those
columns. These assumptions are not likely to be accurate.

traceon 302 output:

Estimated selectivity for col1,


selectivity = 0.999597.

No statistics available for col2,


using the default range selectivity to estimate selectivity.

Estimated selectivity for col2,


selectivity = 0.330000.

No statistics available for col3,


using the default equality selectivity to estimate selectivity.

Estimated selectivity for col3,


selectivity = 0.100000.

costing 22243 pages, with an estimate of 16493 rows.


Search argument selectivity is 0.032987.

Now we add statistics to all columns of the index col1, col2, and col3:

Estimated selectivity for col1,


selectivity = 0.999597.

Estimated selectivity for col2,


selectivity = 0.014925, upper limit = 0.052684.

Estimated selectivity for col3,


selectivity = 0.020003, upper limit = 0.060138.

costing 5898 pages, with an estimate of 149 rows.


Search argument selectivity is 0.000298.

statistics io output:

Table: test scan count 1, logical reads: (regular=5895 apf=0 total=5895),


physical reads: (regular=143 apf=0 total=143), apf IOs used=0
Total writes for this command: 0

(147 rows affected)

After adding statistics to the inner columns of the index, the estimated cost of the query is far
more accurate.

Now let's take a look at the effects of adding statistics to a non-indexed column that's
participating in a join.

A simple join:

select * from test t,test2 t2


where t.col1 = t2.col2
and t.l_orderkey > 200
and t.col2 = 100
and t.col1 <= 300

Without statistics on test2.col1:

Traceon 302 output:

Estimated selectivity for col1,


selectivity = 0.100000.

With statistics on test2.col1:

Estimated selectivity for col1,


selectivity = 0.000052, upper limit = 0.052684.

Statistics on the non-indexed column t2.col1 resulted in more accurate cost estimates. In the
case of this simple join, the presence of statistics on the non-indexed column resulted in a
different join order being used.

Having statistics on columns that participate in joins is especially useful when you enable
sort-merge joins in ASE 12.0 or above.

Changing the Number of Requested Steps

The number of steps (cells) in the histogram of a column can have a direct effect on the
optimizer. If you create an index or statistics on a column that has no statistics, the default
number of steps (20) is used. If statistics exist on the column, the number of requested steps
used will be the last number of requested cells used, unless you specify a different value. You
can also specify the number of steps to be used.

There are a couple of reasons that you may want to change the number of steps for a column.

The first is to get more frequency count cells in the histogram. As mentioned earlier,
frequency count cells represent only one value. This makes them the most accurate type of
cell because their weight is the percentage of the column occupied by the value.

The second reason to increase the number of requested steps is to make the column's
histogram more granular. A more granular histogram makes it easier for the optimizer to
accurately estimate the cost of a SARG. As the number of steps increases, the number of rows
represented by each cell decreases. By making the cells more narrow, they become more
granular.

Narrow cells are particularly helpful to the optimizer when estimating the selectivity of range
and equi-SARGs. In the case of range SARGs, the optimizer estimates how close the SARG
value falls to either boundary of a cell and uses this to estimate selectivity. The more narrow
the cells, the more accurate the estimate. For equi-SARGs, the weight of the cell that the value
falls into is used as one of the two selectivity values for the column. If the column selectivity
value is smaller, the optimizer will find the column more selective.

Keep in mind that this applies only to range cells since they represent more than one. When a
SARG value falls into a frequency count cell, no estimations are needed since the cell
represents only one value.
As mentioned earlier, update statistics has been extended to allow you to place statistics on
columns. It has also been extended to allow you to specify the number of steps to use.

For example:

update statistics table_name using 100 values

The above syntax will update statistics on the leading columns of all indexes in the table and
will use 100 as the number of requested steps.

update statistics table_name (col1) using 100 values

This syntax will create or update statistics on the specified column using 100 as the requested
number of steps.

The number of steps can also be specified in create index with the same x values syntax used
with update statistics.

There is a slight trade-off when increasing the number of steps. Steps require memory taken
from procedure cache whenever the optimizer needs to read them. The amount of memory is
small, but the larger the datatype of the column and the more steps in the histogram, the more
that is needed. Also, each histogram cell for the column has to be read into cache so that the
optimizer can use them. Only in the most extreme cases can this have an adverse effect on
parse and compile time. In the vast majority of cases, the more efficient query plans that result
from more steps far outweigh the cost of caching the steps.

There is no set 'rule' for the number of steps to request. You will need to test to determine
what works best for you. If you upgrade from a pre-11.9.2 version, you will 'inherit' the
number of steps from the old distribution page as the number of request steps for the column.
In most cases, this number of steps is just fine.

Modifying Column Level Statistics

Along with adding column level statistics, you can modify them directly. Every column level
statistical value can be changed manually. Use optdiag to do this since it's not advisable, nor
is it supported by Sybase, to write directly to the system tables. There are a number or reasons
you may want to modify the statistics.

Before moving on, let's talk a bit about how to write the statistics. As mentioned, use optdiag
to do this by getting an optdiag output file. Use the -o file_name option at the command line
(see the Sybase Performance and Tuning Guide for full syntax). Once you have the file, you
can begin to modify the statistics. Use any text editor to do the job. Save a copy of the original
optdiag output file in case you need to start over.

Once the file is edited, you can read it back in via optdiag using the -i option.

When the optimizer uses statistics that have been modified, traceon 302 will print this
message:

Statistics for this column have been edited.


Modifying the Total Density Value

As mentioned earlier, the presence of highly duplicated values (frequency count cells in the
histogram) can have a disproportionate effect on the total density value. This can make the
optimizer pessimistic when costing a join of the column.

So, do you change the total density or not? If you change the value, what do you change it to?

If there is a significant degree of duplication, the total density value will be relatively high.
Remember that a value of 1 means that the column contains only one value. It's advisable not
to set the value too close to 0 because this can result in the optimizer being overly optimistic
about a join of the column.

There are a number of approaches you can take to modify this value. You can use the
arithmetic average (number of distinct values/number of rows). Another method is to factor
the total density value by a multiple of 10. For every multiple of 10, add a zero to the right of
the decimal point.

You can modify the total density manually via an optdiag file or use sp_modifystats. The
procedure allows you to modify the value in three ways. First, you can enter a specific value
to use. This is the method you'd use if you wanted to use the arithmetic average, for example.
You can also use it to specify a value by which to factor total density (as described above).
Finally, you can use it to set the total density to match the range cell density. This is generally
not advisable. In most cases where there are highly duplicated values, the range cell density is
very low, sometimes close to zero. Setting the total density value too low can cause the
optimizer to be too optimistic about the column. sp_modifystats can be used to change the
total density of a single column or a column group (see 'Multi-Column Density Values' earlier
in this chapter). See the ASE docs for specific syntax.

Deleting Column Level Statistics

We've talked about adding and modifying statistics. Another approach you can take is to
delete existing column level statistics. As we discussed earlier, adding statistics gives the
optimizer more information about your columns and indexes. However, there are times when
this added information may not be what you want to present to the optimizer. This is
particularly true if the columns involved are very dense, containing a high degree of duplicate
values.

Again, when there are no statistics on a column, the optimizer can only use hard-coded
assumptions about their selectivity based on the operator of the clause. These hard-coded
selectivity values may be more selective than would be the case if statistics were on the
column. Let's take a look at an extreme example - a gender column. A gender column would
have a selectivity of approximately 50%, or 0.50. If there are not statistics on the column and
it's used in an equi-SARG, the hard-coded selectivity used would be 10% or 0.10, much better
than 0.50.

If you've added statistics to dense columns and gotten less efficient query plans, you may
want to delete the statistics and run the queries again. If they run better, then don't put
statistics back on those columns.
Note Before deleting statistics it's a good idea to get an optdiag output file for the column.
That way, if you need to get the statistics back you don't have to go through an update
statistics.

A Word on Statistics and Upgrading

When upgrading to 11.9.2 or above from an earlier version, you'll need to take the statistics
into consideration. It's a good idea to run update statistics after any upgrade.

If a column is the leading column of an index when upgrading from a pre-11.9.2 version, the
old distribution page is read and its values are used to establish the new statistics. This
essentially copies the old statistics values into the new values. This copy is not as accurate as
statistics obtained by reading the data in ASE 11.9.2 or above. For example, the values of the
steps in the distribution page are used to create the boundary values for the new histogram
steps. The weights are estimated by using the number of values that fall between each step of
the distribution page. These estimates are not as accurate as getting the boundary values and
weights from the data. Frequency count cells are created when a value appears in more than
one step in the distribution page. The number of requested steps to use in the new histogram is
the number of steps in the old distribution page.

After the upgrade completes, you should run update statistics on all your tables. This will
gather the statistics in the new format. For this first post-upgrade run, use the syntax you used
prior to upgrade.

You may have heard that you ought to run the update index statistics command, or even
update all statistics, after upgrading from a pre-11.9.2 version. While it's recommended that
you consider and test the effects of statistics on inner columns of composite indexes and/or on
non-indexed columns, you ought to do so after upgrading the statistics to the new format.
Hold off adding statistics until you have a chance to test them.

There has been some folklore about deleting statistics after upgrade and then running update
statistics. This is not necessary and can result in inefficient query plans. Since the number of
requested steps is taken from the distribution page, the resulting histogram will closely match
that of the distribution page. If you drop and recreate the statistics via update statistics, the
histogram will be built using the default of 20 steps. You should avoid changing the
granularity of the histogram until you've had a chance to test it against your queries.

The bottom line is that after upgrading to 11.9.2 or above from a previous ASE, do not delete
or add statistics until you've tested their effects on your queries. However, do run update
statistics as soon as possible after the upgrade completes.

Maintaining Added and/or Modified Column Level Statistics

As with everything in life, there's a trade-off with adding and/or modifying statistics. As we
all know, update statistics takes time to run, and the more statistics on the table, the more
maintenance that's required. As mentioned earlier, there will be more work that needs to be
done to update statistics on inner columns of indexes or on non-indexed columns than on
leading columns of indexes.
In most cases, the cost of maintenance will be outweighed by the more efficient plans that the
optimizer will generate. The added maintenance cost is another good reason to test the
effectiveness of adding statistics to columns. This added maintenance cost is also a very good
reason to rethink running update all statistics or update index statistics without testing first.

In pre-ASE 11.9.2 versions, update statistics had to read the leading column of an index and
gather the statistics from that column only. Since the column was in sorted order, there was no
need to do anything but the read (scan). In 11.9.2 and above, you can add statistics to any
column. If the column is not the leading column of an index, it needs to be sorted before the
read in order to gather the statistics. This adds I/O and time to the process. It will also require
space in tempdb to handle the work table for the sort. If the column is in an index, the size of
the work table will be the size of the index leaf pages, plus or minus a few pages. If the
column is not in an index, the work table will be the size of the table.

In 11.9.2 and above, an additional scan will need to be done to gather the cluster ratio
statistics. A table scan will need to be done if you specify only a table name or a table and
column name in update statistics. If you specify an index, an index leaf scan will occur,
except in the case of a clustered index on an all pages locked table where a table scan will be
done.

optdiag's Simulate Mode

As mentioned earlier, this section will discuss optdiag simulate mode. This material is
excerpted from a white paper I wrote in 1999.

optdiag's simulate statistics mode is designed to perform two functions - to allow technical
support to reproduce optimizer cases for users and to allow users to perform what-if analysis.
In both cases, the actual data from the database is not required. This saves a great deal of time
and resources. In this mode, optdiag can output a file that contains the same information as
other optdiag modes, along with additional information about some server level resources.

The optdiag simulate statistics output contains an actual and simulated line for each statistics
value that can be simulated. When read back via the -o option, the simulated values are
written to sysstatistics in a special format that can then be used by the optimizer to estimate
costs of queries using values other than the actual values. Below is an example. The actual
value is proceeded with a # and thus will not be written to the system table; only the simulated
line will be written. Only values in rows labeled '(simulated)' can be edited. When changing
statistics, only change (edit) these values. The simulated values can be changed by using a
text editor.

Data page 44308.0000000000000 (simulated)


count:
# Data page 44308.0000000000000 (actual)
count:

What Can Be Simulated?

All table, index, and column statistics can be simulated, along with cache sizes, configured
degrees of max parallelism, and largest partition size information. By manipulating these
values, you can test queries to see how variations affect query plans and performance.
Column level statistics can also be changed using optdiag simulate; they will be written to the
system table sysstatistics. You will need to take special steps to return column level statistics
to their original values (see the 'Simulating Column Level Statistics' and 'Removing
Simulated Statistics' sections later in this chapter).

The optdiag simulate statistics output files will contain all values that can be simulated. Most
of these values are obtained from the two system tables that hold statistical values -
sysstatistics (column statistics) and systabstats (table/index statistics). A handful of values are
gathered from elsewhere within the server. These include values on available caches and their
sizes, information on parallelism, and the size of the largest partition.

Simulating Values That Are Not Stored in the System Tables

Steps should be taken to ensure that the cache, parallelism, and partition values are properly
simulated in the test server. The first two are referred to as shared statistics. The ability to
change these values is unique to the optdiag simulate format.

Caches

Caches that appear in the optdiag simulate output file must be present in the test server before
the file is read in for simulation. If not, optdiag will fail and report an error. You will need to
be sure that all named caches you wish to use in the simulation, or those which are in the
output file you obtained from the source server, are created in the test server. They can be
created using the minimal amount of cache (512K). Also, if tables and/or indexes are bound to
named caches in the source server, or if you want to simulate objects bound to caches, you
must bind these objects to caches in the target server. If you are using any special cache
strategies, they must also be specified.

In the example below, we are simulating a default cache with a 16K pool of 50 MB while the
test server's actual default data cache has no 16K pool.

Size of 2K pool in Kb: 6902 (simulated)


# Size of 2K pool in Kb: 6902 (actual)
Size of 4K pool in Kb: 0 (simulated)
# Size of 4K pool in Kb: 0 (actual)
Size of 8K pool in Kb: 0 (simulated)
# Size of 8K pool in Kb: 0 (actual)
Size of 16K pool in Kb: 25600 (simulated)
# Size of 16K pool in Kb: 0 (actual)

Max Parallel Degree and Max Scan Degree

The simulated values for these two settings can be changed as you wish for what-if analysis.
If you are working with technical support on a case, please give your TS engineer all the
information about any session level settings for these values that you may have used with the
queries in question.

Max parallel 12 (simulated)


degree:
# Max parallel degree: 1 (actual)
Max scan parallel 12 (simulated)
degree:
# Max scan parallel 1 (actual)
degree:

Partitions

If a table is partitioned, the size of its largest partition will appear at the end of the statistics
for table (on clustered index, if one exists) section of the optdiag output. Changing this value
will affect the costing of queries using parallelism. If a table is partitioned in the source
server, or if you want to simulate with partitioned tables, the tables in the test server must be
partitioned.

Pages in largest 5730.000000000000000


partition: (simulated)
# Pages in largest 573.000000000000000
partition: (actual)

Simulating Column Level Statistics

When you change column level statistics via optdiag simulate, they are written directly to
sysstatistics; thus, they are not simulated with the other statistics. However, changes to
column level statistics will be used by the optimizer in the simulation. Keep in mind that to
set column level statistics back to their original values, you must either run update statistics
on the table or column or read in an optdiag file obtained before modifying the column level
statistics.

optdiag simulate in 'What-If' Analysis

The major advantage of optdiag simulate is the ability to perform 'what-if' analysis. In what-if
analysis, you can run queries against various statistics in order to determine how the changed
statistics will affect query plans. It is advisable to keep statistics as close as possible to the
values you expect to see in the actual dataset. If you only have a maximum of 500 MB of
memory available to you, it may not be very useful to simulate the use of a gigabyte of
memory.

Preparing the Simulation

Empty vs. Populated Datasets

It is tempting to simply create tables and indexes and then begin what-if analysis on these
empty tables. However, optdiag simulate output files from empty tables will not include
column statistics. You will need to get optdiag output files either from an existing dataset or
from a small version of the dataset you want to use in simulation. You'll need to make sure
that the statistics reflect your expectations of the distribution of data.
Query Outputs from an Existing Dataset

If you will be simulating an existing dataset in a source server with a set of queries that are
run on that dataset, you may want to get optimizer outputs from these queries run on the
source server. These output files will be essential for comparison against query plans
optimized using simulate statistics. Get the following outputs from the runs of your queries
and/or procedures:

showplan
dbcc traceon (3640,302,310)

Testing Queries Against Simulated Statistics

Once you've read in the simulated statistics via optdiag, you'll want to see how they'll affect
query plans. To do this, you will need to issue the following set command at the session:

set statistics simulate on

This set command tells the optimizer to use the simulated statistics rather than the actual
statistics.

If you are testing a stored procedure that has been previously compiled, you will need to
execute it with the recompile option and with set statistics simulate on; this will allow the
simulated statistics to be used when compiling the procedure.

It is advisable to use traceons 302 and 310 along with showplan to examine the optimizer's
behavior when using simulated statistics. I will present some examples using these in a
moment.

When using simulated statistics, there is no need to use set statistics io on. This is because if
you are simulating on datasets that are empty or different than the source, the I/O that does
occur may differ greatly from the I/O on the source dataset. This could result in misleading
I/O outputs. For example, if the table you are working with has 100 pages and you decide to
simulate 10,000 pages, statistics io output will show that the cost of a table scan takes 100
I/Os and not 10,000.

You can use set statistics time on to see any differences in parse and compile time and its
corresponding server CPU time. However, the server elapsed time and total cpu time will
change when the simulated values become actual values.

Verifying That Simulated Statistics Were Used

Both showplan and traceon 302 will report when simulated statistics are being used by the
optimizer.

traceon 302 example:

Statistics for this column have been edited.

showplan example:
Optimized using simulated statistics.

Removing Simulated Statistics

During your simulation, you may want to remove the changes you've made to simulated
statistics and return to the original values. For statistics that are shared, cache and parallelism
related statistics use the following command to return to their original values:

delete shared statistics

This will remove the shared statistics from system tables in the master database.

Returning to the Original Column Level Statistics

There are two ways to return to original values after using optdiag simulate:

1. To get back to the original column level statistics, you will need to run update
statistics table_name to update statistics on all columns of the table that have statistics,
or you can run update statistics column_name to update statistics on an individual
column. This could take a while if the table is large, so you may want to consider the
second approach.
2. Before beginning your simulation, get optdiag statistics files for each table. Copy
these and do your editing on the copies. After altering values and performing your
simulation, you can read these files back in using the -i option of the optdiag
command. This approach will take less time than updating the statistics on each table.
This is also the easiest way to set the simulated statistics back to their original values
in case you want to start over.

Chapter 6: ASE's Optimizer


This chapter does not attempt to give every detail and nuance of the optimizer. Rather, it's
intended to give you an overview of the steps the optimizer takes to make its decisions.
Understanding the basic processes of this important piece of ASE will help you to better tune
your queries.

What is the Optimizer?

ASE's optimizer, or query optimizer, is the part of the dataserver that determines the most
efficient method to use in order to access the data required by the query. It generates a query
plan based on what it has determined to be the cheapest access methods. The query plan is
then used to execute the query.

The optimizer determines whether an index or a table scan should be used to access a table,
what join order and join type to use, what size I/O to use, and how best to use parallelism.

The optimizer is 'cost-based.' Thus, all of its decisions on which access method to use are
based on the cost, usually in I/O, which the server has estimated as the total cost of data
access for the query.
This is pretty straightforward, but how does it do its job? How can I tune it, manipulate it, and
make it work even better for me? And why do I need to know all this stuff?

Terms and Definitions

Access/access method - How the data required by the query will be retrieved, such as via an
index scan, a table scan, etc.

Tree - A structure that holds a set of nodes. Each phase of query processing produces or alters
a tree.

Node - An instruction.

Query plan - The methods to use to access the data as determined by the optimizer.

SARG - A search argument, where colA = 100.

SARG value - In the above example, the SARG value is 100.

Join - A statement that joins two tables, where tabA.colA = tabB.colB.

OR - A clause containing an OR statement, where colA = 100 OR colA = 200.

Why Learn about It?

Basically, the more you know, the easier and more efficient your performance and tuning
work will be. Also, the more you know about how it works now, the more you'll be able to
take advantage of new features and functionality as they become available.

Query optimization and processing are the 'final frontiers' of relational database research in
the academic world. Optimization is where most changes to dataservers are currently
happening.

What's the Optimizer's Place in Query Processing?

The optimizer is only one piece of query processing, albeit a big piece. When a query is sent
to ASE, it passes through four basic areas of query processing. Let's take a look at them.

The first thing the query encounters is the parser. The parser checks the syntax of the SQL
and returns any errors if necessary. If there are no errors in the syntax, it will, using the rules
of the language, translate the SQL into a 'parse tree.' This is sent to the next phase.

The next phase is the normalizer. The normalizer checks for the existence of the referenced
objects, changes their names in the tree to their object ID, rearranges ANDs and ORs in the
tree, and changes NOTs to != predicate. The normalizer also resolves the datatype hierarchy.
It is here that data mismatches occur. If the normalizer decides that a datatype mismatch
cannot be resolved, it removes the predicate from the tree. It will be executed, but it will not
be passed to the optimizer. What the optimizer doesn't see, it doesn't estimate a cost for. The
normalizer produces the 'normalized tree' (aka, the query tree) and passes it to the pre-
processing phase. The normalizer also performs resolution.

The pre-processing phase is relativity small, but it does some important work. It resolves
views by merging the view definition into the rest of the query tree. This is where aggregates
are resolved and where subquery transformations applying new subquery features are done.

As mentioned earlier, the optimizer will estimate the cost of all possible access methods based
on the query. We'll go into how this is done in more detail soon. The optimizer produces the
query plan. The query plan is a structured set of instructions on how to execute the query. The
optimizer performs compilation.

The query plan is passed to the execution engine. The execution engine is responsible for
executing the steps in the query plan and returning the result set. The execution engine works
with various 'managers' within ASE to get the needed data.

The Phases of Optimization

The optimizer uses two phases to find the cheapest query plan.

The first of these phases is the prep phase (index selection phase). This phase estimates the
cost of using an index to access the table in comparison to a table scan. It does this for each
table in the query. Index selection can be done three times for a table, once for all SARGs,
once for all joins it's involved in, and if there's an ORDER BY statement, once to determine if
there's an index that can be used to avoid a sort.

The second phase is the search engine. This phase is often mistakenly referred to as the 'join
costing phase,' but many more decisions than just the join order are made here.

Let's take a look at what goes on in each of the phases of optimization.

The Prep Phase (Index Selection)

The prep phase estimates the cheapest access method, either an index or a table scan, for the
SARGs and joins against the table. It will also estimate the cost of using an index to avoid the
sort of an ORDER BY. To do this, the optimizer goes through several steps.

The first step is to check to see if the table is a temp table, a work table, or a table that's
created in a batch query (not in a stored procedure). If the table is one of these, the optimizer
has to use heuristics (hard-coded assumptions) for the number of rows and pages it contains.
The values used are 100 rows on ten pages.

If the table is a 'real' (user) table, we move on to get the table/index level statistics. If the
statistics are not in memory, they are read into memory from systabstats. The optimizer will
then read them from the memory copy. For more on the table/index statistics in memory, see
Chapter 5, 'The Optimizer Statistics.'
Now that the table/index level statistics are available, the base cost can be established. The
base cost is the cost of a table scan. This value is necessary to compare all other accesses of
the table to. It is seen in traceon 302 output as 'Table scan cost' in the 'Base cost block' or
second section of the output:

*******************************
Beginning selection of qualifying indexes for table 'lineitem',
correlation name 'l', varno = 0, objectid 240003886.
The table (Datarows) has 600572 rows, 44308 pages,
The table's Data Page Cluster Ratio 1.000000

Table scan cost is 600572 rows, 44489 pages,


using no data prefetch (size 16K I/O),
in data cache 'default data cache' (cacheid 0) with MRU replacement

The size of the I/O for the table scan will be indicated along with the cache it's coming from;
in this case, it's 2K I/O from the default cache.

Now that the base cost has been established, the predicates (WHERE clauses) can be
examined. The optimizer looks for LIKEs, JOINs, SARGs, and ORs, in that order. Once
found, it estimates the cost of each in this order - SARGs/LIKEs, ORs, and JOINS. If a LIKE
statement is found, it is transformed into two range SARGs (a between).

At this point, if it's on, join transitive closure is applied. Let's say you have a join like this:

where T1.c1 = T2.c2 and T2.c2 = T3.c3

Intuitively, we know that T1.c1 = T3.c3. Join transitive closure adds this join and gives the
optimizer more join plans to examine. SARG transitive closure is also applied.

Once the predicates have been found, they are arranged for costing. SARGs are costed first.
The SARG value is checked to see if it's known at this point by the optimizer. If the SARG
value is not known, either the default selectivity values or hard-coded values have to be used
as the selectivity.

Unknown SARG values most commonly occur when local variables are used in a batch query
or a stored procedure (the use of some functions can result in unknown values). If a stored
procedure uses parameters, the SARG values are known and can be used in optimization.
Let's take a look at what you'll see in 302 output when there are unknown values:

SQL example:

1> declare @var1 int


2> select @var1 = 2000
3> select * from li2
4> where a = @var1
5> go

The value of @var1 is unknown by the optimizer at run time.

If an unknown SARG value is present, a selectivity value based on the operator of the clause
will be used. If there are no statistics on the column, the only choices possible are 0.10 (10%)
for an equi-SARG (col = value), 0.33 (33%) for a range SARG (<, <=, >, =>), and 0.25 (25%)
for a between SARG. When there are no statistics on the column, the optimizer has no other
choices. When there are statistics on the column, there's a bit more flexibility. When statistics
are created on a column, the default selectivity values for RANGE and BETWEEN clauses
are created. The total density value is used as the default selectivity for equi-SARGs. Here's
an example optdiag output:

Statistics for column: "a"


Last update of column statistics: Oct 5 2001 11:59:41:323AM

Range cell density: 0.0000083294442894


Total density: 0.0000083294442894
Range selectivity: default used (0.33)
In between selectivity: default used (0.25)

The two default selectivity values and the total density can be changed to any value you like.
However, caution should be taken when changing the total density value. See Chapter 5, 'The
Optimizer Statistics' for more details.

Below is 302 output from our example query above when there are statistics on the column.

Selecting best index for the SEARCH CLAUSE:


li2.a = unknown-value

SARG is a local variable or the result of a function or an expression,


using the total density to estimate selectivity.

Estimated selectivity for a,


selectivity = 0.000008.

As you can see, the SARG value is marked 'unknown-value,' followed by a message
describing why. In this case, the column selectivity is the total density (slightly rounded off).
If there were no statistics, the column selectivity would be 0.100000, a considerable
difference. If this had been a range or in-between SARG, the respective default selectivity
value would have been used.

If the SARG value is known, the optimizer then needs to read the column level statistics
(distribution statistics) to estimate selectivity. If there are statistics on the column, they are
read into procedure cache - only the density values and the histogram cells (steps) are read
into cache. The optimizer can only read the statistics from cache. Once optimization is
complete, the statistics are removed from cache.

No statistics available for colB,


using the default equality selectivity to estimate selectivity.

Estimated selectivity for colB,


selectivity = 0.100000.

As you can see, column colB does not have any statistics on it. In this case, the optimizer has
to use a hard-coded default selectivity value based on the operator (in this case an equality).
For a range SARG, the value used will be 0.33. For a between, it will be 0.25. These values
cannot be changed. When you see the 'No statistics available' message, you should consider
adding statistics to that column; it will give the optimizer accurate information about the
column, making cost estimates more accurate.

Estimate Column Selectivity

Now that the column level statistics or the default selectivity values are available, the
optimizer can begin estimating the selectivity of the column or columns of the table that are in
the query. Column selectivity is used to help make index selection more accurate.

Selectivity is a description of how useful an access method will be when getting the data
needed for a given SARG or join. In general, the more highly duplicated values there are, the
denser the column or index is and the less selective it will be. Estimated selectivity is
dependent on the clause's operator, the search value(s), and the statistics. In traceon 302
output, the closer to 1 a selectivity value is, the less selective the column and/or index.

Estimated selectivity for colA,


selectivity = 0.010473, upper limit = 0.015088.

In the above example from traceon 302 output,we see that selectivity for column colA has
been estimated. The 'selectivity' value is the range cell density of the column. The 'upper limit'
is the weight of the cell or cells that the SARG value(s) fall into. The upper limit will only be
used when a range SARG results in an estimated selectivity that is greater than the upper limit
value. Otherwise, the selectivity value will be used as the column's selectivity.

The SARG value(s) need to be compared to the histogram. Since we know the type of
predicate in the clause(s), we can move on to the next step. We determine where the value(s)
fall within the histogram. The operator of the clause and the histogram will then be used to
estimate column selectivity.

If the clause is an equi-SARG and the value falls into a frequency count cell (see Chapter 5
for more info on cell types), then the weight of the cell is used as the column selectivity. If the
value falls into a range cell, either the range cell density or the cell's weight will be used as the
column selectivity, whichever is smaller. If you look at 302 output, you'll see these values
reported as the column's 'selectivity' (range cell density) and 'upper limit' (the cell's weight).

If the clause is a range SARG (<, <=,>,>=, BETWEEN) and the value(s) fall into a frequency
count cell, again use the cell's weight for the column selectivity. If the value(s) fall into a
range cell, the optimizer has to estimate how much of the cell's rows will qualify for the
SARG. It does this by measuring how close to the boundary of the cell the value(s) fall. Once
it's determined this, it then uses that fraction of the cell's weight. If the range SARG values
span cells, the interpolated weights are added to get the selectivity.

If the SARG value(s) fall outside of either end of the histogram, that is, the value is either less
than the minimum value in the histogram or greater than the maximum value in the histogram,
special costing is done by the optimizer. Depending on the operator and which end of the
histogram the value falls outside of, the selectivity will be either 0.00 or 1.00. If the operator
is an equality (=) and the SARG value falls outside of either end of the histogram, the
selectivity will be 0.00. If the operator is a less-than-or-equal-to (<=) and the SARG value is
less than the minimum value in the histogram, the selectivity will be 0.00. With the same
operator, if the SARG value is greater than the maximum value in the histogram, the
selectivity will be 1.00. If the operator is a greater-than-or-equal-to, the selectivity values will
be the opposite (that is, 0.00 if the SARG value is greater than the maximum histogram value
and 1.00 if it's less than the minimum histogram value). See Chapter 5 for more details on the
histogram values.

Estimate Index Selectivity

Once the column's selectivity has been established, it can be used to help estimate the cost of
accessing qualified indexes. The use of column selectivity to aid in index costing is new in
ASE 11.9.2 and above.

An index is qualified for costing if it satisfies one of a number of conditions. Of course,


indexes on columns that are not referenced in the query are not qualified.

An index is qualified for costing if:

• The force index is being used - There's no reason to go any further in index selection.
• The index is unique - A unique index is a very cheap way to access the data, since
reads will only have to be done down the index's tree to the datarow.
• The index covers the query - If the index 'covers' the query, no datapages will need to
be read. This is a cheap access.
• It's a clustered index on an APL table - The clustered index should be costed because
of its physical sorting. This only applies to APL tables though.
• There's an ORDER BY on the leading column of the index - This index may be useful
to avoid the creation of a work table for the necessary sort.
• The index can be used to perform an aggregate function.
• The index can be used for an updatable cursor.

Now that the qualified indexes have been determined, we can begin estimating the cost of
using them. As mentioned earlier, the estimated column selectivity values are used to help
estimate the cost of using an index.

The 'scan' and 'filter' selectivity values are the primary values used in index cost estimation.
The scan selectivity value is used to estimate how limited the scan of the index will be. It is
the number of leaf rows and index pages that will be read. It is determined by the SARGs on
the leading column of the index. The filter selectivity is used to estimate how many logical
I/Os will be used to read the datarows from the index. It's based on SARGs on all columns of
the index. Basically, these values are estimating costs at different levels of the index. Let's
take a look at an example.

Let's assume that the table has an index which consists of three columns (sex, room, temp). Of
course, this is a rather extreme example - using an index with a gender column as its leading
column.

Our query for this example is:

select * from tabA where sex ="M" and temp > 105
The SARG 'sex = ‘M'' can be used to limit how much of the index needs to be read. It will
help determine where the index scan begins and ends; this is what the scan selectivity is
measuring.

The SARG 'temp > 105' can be used to limit the number of datarows that need to be read and,
thus, the number of logical I/Os. In other words, it's helping to filter the index scan even
further; this is what the filter selectivity is measuring.

The cluster ratios of the indexes are used to determine two things - the cost of using large I/O
and the cost of reading rows on the datapages from the leaf rows of the index. See Chapter 5
for details on what each of these values is measuring. If large I/Os are available, and the
cluster ratios indicate that their use will be efficient, the optimizer will add this to the costing
of the index. The closer the index and/or datapage cluster ratios are to 1, the better clustered
the index and table are and the more efficient large I/O will be. If you're running ASE 12.0 or
higher, optdiag reports 'Large I/O efficiency.' See Chapter 5 for more details on this value. It's
possible that one size I/O can be used to access the index pages while another one is used to
access the datapages. You can see this in the Best Access Block (the last section of
information in traceon 302 output). The MRU/LRU strategy is also determined at this point.

Once the indexes have been costed, the optimizer then looks for an index or indexes that can
be used to avoid the sort that is forced by an ORDER BY. If there's an ORDER BY in the
query and there's an index on the ORDER BY column, the index will be costed to see if it
would be cheaper to use than to do a work table sort.

A Note on Costing LIKEs

A LIKE will be transformed into a range SARG. It will then be costed the same as any range
SARG. Here's an example:

where colA like "DEL"

This LIKE will be transformed into the range SARG seen in the 302 example below:

Selecting best index for the SEARCH CLAUSE:


table.colA < 'DEM'
table.colA >= 'DEL'

Costing Indexes for Joins

Index selection for join costing follows SARG costing. Here, the total density value is the
principal value used to estimate the cost of using an access for the join. The histogram
(distribution statistics) is not used in estimating the cost of joins. However, the estimated
selectivity of SARGs on the table is also used to help estimate the cost of a join. The total
density and the estimated SARG selectivity are compared to find the join selectivity.

Costing ORs

There are three possible access methods and ORs to perform - a table scan, dynamic OR
processing, and special OR processing.
A table scan is self-explanatory. In dynamic OR processing, the cost of an index to get row
IDs, move them into a work table, sort them, and scan them is estimated. This is called a
'dynamic index' in showplan output.

In special OR processing, the cost of accessing a non-clustered index for each OR clause will
be estimated. Often, this is the most efficient way to process ORs. However, non-clustered
indexes are required for special ORs.

Index Selection Completion

At this point, index selection for all SARGs and joins of the table(s) is complete. The cheapest
estimated accesses for SARGs and joins on each table have been determined. Let's take a look
at the Best Access Block of traceon 302 output:

The best qualifying index is 'test_nc' (indid 2)


costing 22243 pages,
with an estimate of 16493 rows to be returned per scan of the table,
using no index prefetch (size 16K I/O) on leaf pages,
in index cache 'default data cache' (cacheid 0) with MRU replacement
using no data prefetch (size 2K I/O),
in data cache 'default data cache' (cacheid 0) with MRU replacement
Search argument selectivity is 0.032987.

*******************************

In the example above, the optimizer has determined that the non-clustered index test_nc
would be the most efficient way to access this table for either the SARGs or joins on it (in this
example, it's a SARG). It has also determined that it would be most efficient to read the index
pages using 16K I/O while reading the datapages in 2K I/O. In both cases, the MRU
replacement strategy is estimated to be the best.

The best qualifying access is a table scan,


costing 9435 pages,
with an estimate of one row to be returned per scan of the table,
using no data prefetch (size 16K I/O),
in data cache 'default data cache' (cacheid 0) with MRU replacement
Join selectivity is 0.000007.

*******************************

In the example above, the optimizer has determined that a table scan using 16K I/O is the
most efficient way to access the table (in this case, to perform a join).

At this point, index selection is complete. However, it's not the final word on the query plan
that will be executed. The index selection decisions made in this phase are then passed to the
next phase, the search engine. More decisions are made there. This has been a source of
confusion from time to time for those who are new to reading traceon 302. Since the Best
Access Block in the 302 output begins with 'The best qualifying … is…,' it is often taken as
the final decision. It's only the final decision within the costing of qualified indexes.
The Search Engine Phase

The second phase of optimization is the search engine, sometimes mistakenly called 'join
costing.' All queries must pass through the search engine because there are several decisions
made in this area.

The best accesses determined in the prep phase are passed into the search engine. There is no
further index selection at this point. However, an index access that was found to be cheapest
in the prep phase can be 'demoted' to a table scan in the search engine. This usually happens
because of poor clustering. The best access methods from the prep phase are used in the
search engine to estimate the cost of plans.

Physical and Large I/O Estimation

For all queries, the search engine estimates the amount of physical I/O that must be done. It
will also estimate the cost and usefulness of large I/O. Large I/O costing is done after physical
I/O costing.

The search engine will determine if the object (table or index) is bound to a cache, if it will fit
into cache, and what the probability is that a significant number of its pages are currently in
cache. These estimates, in combination with the cluster ratios, will be used to estimate the
physical I/O estimates. These estimates may be higher than actual, physical I/O due to
clustering and the number of pages that are actually in cache.

Join Costing

The search engine permutes through all possible join orders, costing all possible plans based
on the join orders and on the access methods from the prep phase. Another term for
permutation is join order, the order in which the tables will be joined. Let's take a quick look
at a simple join as an example:

select ...
from tabA, tabB
where tabA = 10
and tabB = 100

Here we have two possible permutations (join orders) - tabA-tabB and tabB-tabA. If there's a
qualifying index on each table for the query, we can add two more plans. So, we'll have both
the tabA-tabB table and the tabB-tabA table scan and index access. More plans will be
considered if large I/O is involved. The more possible access methods, the more plans to cost
for each join order.

The total number of join orders depends upon the number of tables in the join. The number of
join orders will be N!, where N is the number of tables in the FROM clause. For a six-table
join, it works out to 6 x 5 x 4 x 3 x 2 x 1 = 720 possible join orders. This is only the number
of join orders considered; it does not include other plans such as index accesses and large I/O.
Also, if 'sort merge join' is on, there will be additional plans to examine.

Each table will be costed in the search engine using the best access sent from the prep phase
and table scan. As mentioned earlier, large I/O will also be added to the costing if it's
available. If parallelism is set, parallel access methods will also be costed along with the join
orders and other access methods.

Join orders will be costed for joins in sets of four at a time. So, let's say you have a join of five
tables - A, B, C, D, and E (in that order in the FROM list). Tables A, B, C, and D will be
costed together. The best outer table will be determined and 'saved off.' Let's say that table B
was determined to be the best outer table. Table E will now be brought in and costed with the
remaining three tables. The best outer table will be determined and saved off. This continues
until the best join order ID found.

There is a possibility that a more efficient plan may be overlooked when there are more than
four tables in the join. You can specify the number of tables to be examined at a time by
setting the number with set table count. This may add a bit of time to the costing of plans, but
in many cases, it would be a good trade-off if a more efficient plan can be found.

The optimizer will examine all plans and reject those that are more expensive than the
cheapest one. If a cheaper one is found, it will be marked as the cheapest and the costing of
plans will continue until all have been costed. At that point, the cheapest plan (join order and
access method) will be used as the final plan. This also applies to queries without joins; they
will also pass through the search engine, and a final plan will be determined. If you're on ASE
12.0 and above and have the abstract plan capture on, the AP will be written to sysqueryplans
at this point.

The query plan is now sent to the execution engine to be run. At this point, optimization is
complete.

We've now done an overview of how the optimizer works. I hope this information has helped
you better understand the optimizer and its place in the larger scheme of query processing.
Knowing how the optimizer works is an important part of performance tuning.

Chapter 7: Cache Strategies


Sybase Memory Management

The efficient deployment of appropriate amounts of memory is key to achieving optimal


performance. Having available memory will reduce, or even eliminate, the need for physical
I/O. This, in turn, can enable a quantum improvement in response time, since memory access
is always much faster than disk access. However, it's not just a matter of bringing up a server
with tons of memory; it's also how you deploy and allocate the memory resources you have.

There are many configuration parameters that can bleed away your memory, leaving you with
a far smaller data or procedure cache than anticipated. It is crucial for you to understand your
application as you determine its server's memory configuration. Are they expecting to have
multiple concurrent processes using large packet size? What's your default network packet
size? What are the stack, stack guard, and ULC parameters set to for your logins, and how
many logins are you configured for? If you're contemplating using the larger logical page size
and (new) wider columns supported by 12.5, don't forget about your heap memory per user
configuration parameter when projecting your anticipated data and proc cache sizes. Are you
going to do a lot of parallel processing? Basically, it's not only what you have, it's how you
use it. Let's step back and start at the beginning: How does Adaptive Server 12.5 allocate and
use memory?

Memory Allocation

In Adaptive Server 12.5, we now address memory from two perspectives - logical memory
and physical memory. Total logical memory is a read-only parameter that indicates the total
memory requirements for all of your configuration parameters, including the data and
procedure cache. Put differently, it tells you how much memory is required to deploy a given
server configuration. This amount of memory must be available to the Adaptive Server at
startup time, though it can be quickly changed (upward) once the server is running through
the adjustment of dynamic configuration parameters.

If you execute sp_configure for this or any other configuration parameter, the config value is
the memory requirement at boot time, while the run value shows the memory requirement at
the point in time. Reviewing the sp_configure output for parameters will enable you to track
changes since boot time. In addition, the total logical memory parameter can help when
performing capacity planning and server rollout. For example, all trading servers could be
built using a prototype config file (ASE12.5_trde.cfg) with a memory requirement of nn
megabytes of memory.

Total physical memory is the amount of memory Adaptive Server uses at any given moment
or, put another way, the sum of all of Adaptive Server's shared memory segments. In 12.5, the
new read-only parameter total physical memory can be queried to indicate the current value.
For this parameter, the config value is always set to zero while the run value shows current
usage. Once the server is booted, the run value will only increase, even if parameters are
adjusted downward after boot time since memory is only released when the server is recycled.

Where Does All the Memory Go?

Let's briefly cover where all the memory goes. Initially, the size of the executable code is
included in, or subtracted from depending on your perspective, your total logical memory.
The amount of memory allocated to code and overhead varies by platform and release, but
you can assume it's within the 6 to 8 MB range. To get the exact value, execute sp_configure
for the executable codesize plus the overhead parameter. Please note that the size of code plus
overhead will increase if you enable cis.
Procedure (Max # of concurrent users)*
cache size = (Size of largest plan)* 1.25

Minimum procedure (# of main procedures)*


cache size needed = (Average plan size)

The number of open databases, number of open indexes, and number of open objects
parameters are managed by metadata caches. These caches reside in the kernel and server
structures section of Adaptive Server memory.

There are other groups of server configuration parameters that have the potential to use up
significant amounts of memory. The amount of memory used per user connection will vary
across platforms and is a function of the values of several configuration parameters, which are
listed below. The memory that parameters like stack size and stack guard size use is figured
into the calculation of total memory for the parameters they contribute to; that is, the user
connections and worker processes. This is indicated by the # which appears in their memory
used column.

• default network packet size


• stack size and stack guard size
• user log cache size
• heap memory per user

The number of locks parameter can become an overlooked memory drain in Adaptive Servers
which have heavy use of data-only locked tables. Adaptive Servers that perform parallel
processing can also require potentially significant amounts of memory just to support these
activities, as each worker process requires an amount of memory comparable to a user
connection. The total amount of memory required by a worker process is determined by the
following parameters:

• default network packet size


• stack size and stack guard size
• user log cache size
• memory per worker process

The memory per worker process parameter controls how much additional memory is placed in
a pool that is made available to all worker processes. Within this pool are stored
miscellaneous data structure overhead and inter-worker process communication buffers.
Parallel processing also has an indirect impact on procedure cache size. Each worker process
makes its own copy of the query plan in space claimed from the procedure cache, while the
coordinating process keeps two copies of the query plan in memory. So, you should always
factor this in when calculating procedure cache size.

Memory Management Changes in Adaptive Server 12.5

We have already discussed that in Adaptive Server 12.5, memory is viewed from both a
physical and logical perspective. The next change in Adaptive Server 12.5 that we need to
discuss is the max memory parameter and its role in dynamic memory allocation.

Max memory is a new, dynamic memory configuration parameter that controls the maximum
amount of shared memory that Adaptive Server is allowed to allocate. Max memory will
typically be set to a value that is greater than the total logical memory the server requires to
boot up. This enables you to increase dynamic parameters while the server is running. Along
with this change, a number of parameters that were static in nature (that is, the server needed
to be recycled before any changes made to them would take effect) have been changed to
dynamic parameters in Adaptive Server 12.5. This means that these parameters can be
changed while the server is operational and, assuming that total logical memory is currently
less than total max memory, additional memory will be allocated so that the changes can take
effect without requiring a server recycle. An Adaptive Server can be configured to allocate the
total max memory value at startup, as opposed to only allocating the total logical memory if
the allocate max shared memory parameter is set to 1. The default, 0, is used only to allocate
at startup time the total logical memory requirement.

Assuming that max total memory exceeds the total logical memory, dynamic parameters can
be adjusted while the server is operational, and the change will take effect immediately. You
can control, however, whether all the additional memory required by the parameter change is
allocated immediately or gradually. Typically, Adaptive Server will allocate additional shared
memory segments when there is a demand for them. For example, if you change the number
of user connections from 100 to 150, Adaptive Server will wait until the number of
connections exceeds the previous limit of 100 before it allocates memory for them. However,
if you set the dynamic allocation on demand parameter to a value of 0, then any and all
additional memory required by a parameter change is allocated during the reconfiguration
itself. If there is insufficient max memory to do so, an error will be reported. You can then
attempt to increase the max memory parameter and try increasing the parameter again.

When additional memory segments are allocated, the size of these segments is determined by
an internal algorithm. There is also a maximum limit of 256 on the number of shared memory
segments that any Adaptive Server can allocate. Attempts to allocate more will fail, and the
server will need to be recycled so that the Adaptive Server can allocate a smaller number of
larger memory segments.

The final memory management change impacts how the procedure cache is sized. In releases
prior to 12.5, the procedure cache percent parameter sized the cache as a percentage of
available memory, with the rest going to the data cache. In Adaptive Server 12.5, the new,
dynamic parameter procedure cache size enables you to specify a specific size, in megabytes,
for the procedure cache.

The following are the changes to ASE 12.5:

• Approximately 42 static configuration parameters were changed to dynamic


configuration parameters.
• Shared memory is allocated dynamically.
• Dynamic reconfiguration is part of the high availability group of features.
• Logical memory - the total memory required for current configuration.
• Physical memory - the sum total of all shared memory segments allocated in ASE
(@@tot_memory).
• Max total memory - the maximum size of physical memory ASE can allocate.
• In the ideal environment, the relationship of the three should be:

physical memory < logical memory < maximum total memory


• Default data cache size must be specified as absolute value.
• Total memory is a read-only internal parameter indicating logical memory for current
configuration.
• Procedure cache size added.
• Changing configuration parameters affects total memory size.
• Number of engines at startup added.
• sp_engine is a new interface to dbcc engine().

The following parameters are changed from static to dynamic:

• audit queue size


• disk I/o structures
• max cis remote connections
• memory per worker process
• number of alarms
• number of aux scan descriptors
• number of devices
• number of dtx participants
• permission cache entries
• process wait events
• number of large I/o buffers
• number of locks
• number of mailboxes
• number of messages
• number of open databases
• number of open indexes
• number of open objects
• number of worker processes
• partition groups
• permission cache entries
• process wait events
• size of global fixed heap
• size of process object heap
• size of shared class heap
• size of unilib cache
• additional network memory
• plan text pipe max messages
• statement pipe max messages
• errorlog pipe max messages
• deadlock pipe max messages
• sql text pipe max messages

Spinlock ratio configuration parameters:

• open index hash spinlock ratio


• open index spinlock ratio
• open object spinlock ratio
• user log cache spinlock ratio
• partition spinlock ratio
Non-memory related configuration parameters:

• timeslice
• cpu grace time
• default database size
• default fill factor percent
• number of pre-allocated extents
• tape retention in days
• print recovery information

Deleted configuration parameters:

• procedure cache percent


• max roles enabled per user
• min online engines
• number of languages in cache
• freelock transfer block size
• max engine freelocks
• engine adjust interval
• max cis remote servers

Memory and Performance

Most of the performance issues can be eliminated with efficient memory management.
Depending on the platform and the memory usage for other OS-type tasks, the memory that is
allocated to the cache plays a major role in performance benefits. More memory reduces the
disk I/O because the data does not have to be read from the disk since the data and index
pages are available in the cache. Whenever a query is issued, if the required data is in the
memory or is able to read into the memory, the response is much faster. If the data is already
in the memory, Adaptive Server does not perform disk I/O.

Giving more memory does not end the performance issue. Poor use of memory allocation can
also add to the problem. Allocating the data cache poorly is as bad as not having enough total
memory. The issues with performance can be due to the following:

• The total data cache size is too small.


• The procedure cache size is too small.
• Only default cache is defined and all the CPUs are contending for that data cache.
• User-configured data cache sizes are not appropriate for specific user applications.
• Configured I/O sizes are not appropriate for specific queries.

Procedure Cache

The query plans of a stored procedure are stored in the memory depending on the frequency
of the stored procedure. Depending on the use of the stored procedure, the query plan can be
stored in the MRU (most recently used) chain or LRU (least recently used) chain of query
plans. Whenever an execute command for a stored procedure is executed, the Adaptive Server
looks in the procedure cache for the query to use. If the query plan is available, it will be
taken to the MRU end of the chain. However, if the plan does not exist or if all the copies are
in use, the query plan has to be read from the sysprocedures table. This is then optimized,
depending on the parameters, and executed.

The query plans that are available at the LRU end of the chain keep waiting to be used and if
not used, they will slowly age out and will be out of the procedure cache. The following
figure illustrates how the procedure cache handles the query plans with the MRU/LRU end of
the procedure cache chains.

The memory allocated for the procedure cache holds the optimized query plans (and
occasionally trees) for all batches, including any triggers. There will be multiple copies of the
procedure and trigger in the cache if more than one user requests it. If the procedure cache is
too small, an error message is displayed showing insufficient memory, and the user will be
able to execute once the memory in the procedure cache frees up.

Getting Information about the Procedure Cache Size

When the server is started, the log that is generated is defined in the -E option of the
startserver.

00:00000:00000:2001/05/02 15:22:05.21 server Number of proc buffers


allocated: 60604.
00:00000:00000:2001/05/02 15:23:11.92 server Proc header memory allocated
7575 pages for each per engine cache
00:00000:00000:2001/05/02 15:23:11.94 server Number of blocks left for proc
headers: 64548.
00:00000:00000:2001/05/02 15:23:11.94 server Memory allocated for the
default data cache: 1028474 Kb

• proc buffers - The number of proc buffers represents the maximum number of
compiled procedural objects that can reside in the procedure cache at one time. No
more than 60604 compiled objects can reside in the procedure cache simultaneously.
• proc headers - Proc headers represents the number of 2K pages dedicated to the
procedure cache. In the above example, 7575 pages are dedicated to the procedure
cache. Each object in cache requires at least one page.

Procedure Cache Sizing

How big should the procedure cache be? On a production server, you want to minimize the
procedure reads from disk. When a user needs to execute a procedure, Adaptive Server should
be able to find an unused tree or plan in the procedure cache for the most common
procedures. The percentage of times the server finds an available plan in cache is called the
cache hit ratio. Keeping a high cache hit ratio for procedures in cache improves performance.
You can use the sp_sysmon reporting procedure to track your procedure cache hit ratio.

Data Cache

Once the procedure cache value is sized, the remainder of the memory is allocated within the
one or more data caches defined within the server. Allocating the memory is important
because this is the area where the data is read from the disk. When sufficient memory is
available in the data cache, data will remain in the cache longer. This will improve
performance because less disk I/O will need to be performed.

When the server is first installed, it is one data cache (that is, the default cache that stores all
in one single cache area for the server processes, objects for data, index, and log pages).

Data that is read into the data cache area is kept until it ages and/or more data is read into the
data cache. The Adaptive Server data cache handles this with the MRU/LRU strategy. Once
the data ages, it moves into the wash area where any dirty pages are written to disk. Dirty
pages are those pages that have been modified while in memory.

Workload Mix

Tuning requires a great deal of analysis for the application types. There are three major types
of workload, each with their own unique characteristics: OLTP (online transaction
processing), DSS (decision support system), or a mixed workload.

OLTP type transactions have high frequency inserts, updates, or deletes. They also tend to use
greater physical I/O and are prone to contention when not correctly designed. DSS, on the
other hand, has lower frequency of data modification. For large updates and inserts, bulk
copying utility (bcp) can help. Ad hoc queries that are written without any thought to tuning
seem to use a number of table joins that can hurt the performance. Mixed workload seems to
have the combination of both, where it uses the same dataset for both simple and complex
queries.

Named Cache

Adaptive Server offers numerous ways to design data caches to address application
performance issues and system performance issues like last page contention.

Whenever Adaptive Server is installed, it creates one single default data cache with a 2K
memory pool. This cache also has a single spinlock, meaning it has one cache partition. It is
necessary to split the cache into multiple named caches to improve performance. Once the
data caches are split, the database object or the single database can be bound to them.

The buffer pools for the named caches can also be configured for performance. Buffer pools
can be configured to 4K, 8K, and 16K buffer pools for both user-defined data caches and
default data caches, which will allow Adaptive Server to perform large I/O.

There are four components in a data cache - the buffer, buffer pools, MRU/LRU chains, and
hash table. A buffer is a data structure that can hold one or more data pages in cache. The
illustration on the following page shows the components that form named cache structures.
Every buffer has a header and data pages. Each buffer can be 2K, 4K, 8K, or 16K. For every
buffer, there is a MRU/LRU chain associated with it.

Similar size data cache buffers form a buffer pool. A single data cache can have more than
one buffer pool. Every data cache will have one 2K data buffer. The hash table contains the
list of all the buffers that are currently in the cache. This table helps to locate data for a
specific table if it resides in the cache. If the hash table does not have the object it is looking
for, it signals the buffer manager to perform a disk I/O to get the data into the cache area.

When a memory pool is deleted, that memory is not available until the Adaptive Server is
rebooted. Otherwise, the objects that are bound to the cache still continue to be referenced.
Once the Adaptive Server is rebooted, the pages are automatically added to the default cache,
unless it is rebound again to another data object.

The MRU/LRU Chain

Every buffer has the MRU/LRU chain, which helps the Adaptive Server to prioritize which
buffer is clean and is available to the buffer manager. The MRU side of the chain has the most
recently used data and is not cleaned. The LRU side of the chain is available to the buffer
manager, and they are clean pages. The size of the buffer pool gives the overall length of the
chain.

Note A clean page is generally written on the disk, and it will be the corresponding page as
seen in the disk. Dirty pages get their name because they are changed in the data cache
and are not written to the disk.

Cleaning Buffers

Whenever a query is issued and there is a need for a clean empty page, Adaptive Server tries
to clean the existing data in the data cache by cleaning the LRU end of the chain. The buffers
in the data cache are cleaned by Adaptive Server using several methods:

• The wash point or the wash marker


• The housekeeper task
• Checkpoint
The wash point needs to be decided based on how often the data needs to be in the cache and
when it can be cleaned. This is application-dependent. Whenever the data in the data cache
traverses the wash point, Adaptive Server starts asynchronous I/O on that page. Then the
write completes and the page becomes available. The buffer manager then reallocates this for
future use.

Checking how much data is moved away from the wash point is a housekeeping task. Usually
these housekeeper tasks are performed when the CPU is idle. During the CPU idle cycles, the
housekeeper task cleans all the dirty pages from the wash point and makes it all the way up to
the MRU side of the chain. The checkpoint process also forces the housekeeper to wash the
dirty pages.

Cache Strategy

Data cache in Adaptive Server uses two major strategies for efficient performance. They are:

• LRU (Least Recently Used) replacement strategy


• MRU (Most Recently Used), or fetch-and-discard, replacement strategy

When executing a stored procedure, if the query does many reads or a number of updates to
the pages, the LRU strategy is used. On the other hand, if the query needs to read the data
only once, it uses the MRU strategy more to fetch and then discard the information after use.
The query optimizer will decide which strategy is best while creating the query plans. The
showplan for the stored procedure will reveal the types of strategies used in the plan.

Tuning for performance using the data caching strategy depends on whether the application is
an OLTP or DSS type.

Choosing the Cache Strategy

Whenever a query needs to scan the data page or the leaf level of the non-clustered index,
usually known as covered queries, the optimizer will make a choice between the LRU or the
MRU strategy. However, the fetch-and-discard strategy will be used:

• When a query performs a table scan.


• When a query uses a clustered index.
• When there's a nested loop join (only when the inner table is bigger than the cache
size).
• When there's an outer table in the nested loop.
• When a query needs to read the table just once.

Whatever method the optimizer chooses, there is always a way to direct the optimizer to
which strategy to use. In the select, update, or delete statement, give the strategy type after the
FROM TABLE clause. Second, use the sp_cachestrategy command in the beginning of the
query execution. This command will either enable or disable MRU strategy.

If an MRU strategy is specified, then the buffer manager checks which part of the buffer chain
data resides. If the data page is already in the data cache, the data will move to the MRU part
of the buffer chain from the wash marker, and that data is available for the length of the query
(see the syntax on the next page). The root and intermediate pages will always make use of
the LRU strategy.

select <col name>... ...


from <tablename> (index <indexname> prefetch size [lru|mru])
[, <tablename> ...]
where ...

delete <tablename> from <tablename> (index <indexname> prefetch size


[lru|mru]) ...

update <tablename>
set <col_name> = <value>
from <table_name> (index <indexname> prefetch size [lru|mru]) ...

Syntax for LRU replacement strategy for 16K I/O:

select au_lname, au_fname, phone


from authors (index au_names prefetch 16 lru)

Make a careful analysis of the type of I/O and the scan the query will make before deciding to
manually override the cache strategy.

Syntax for cache strategy:

sp_cachestrategy dbname , [ownername.]tablename


[, indexname | "text only" | "table only"
[, { prefetch | mru }, { "on" | "off"}]]

This command turns off the large I/O prefetch strategy for the au_name_index of the authors
table:

sp_cachestrategy pubtune,
authors, au_name_index, prefetch, "off"

Large I/O and Cache Strategies

There is a status column in the sysindexes table to actually identify whether the table or the
index should use large I/O prefetch or MRU replacement strategy. The Adaptive Server will
turn both of the features on by default. Depending on the application need and looking at the
performance considerations, either of these strategies can be turned on or off using the
following syntax:
sp_cachestrategy <dbname>,
<[owner].tablename[indexname|"textonly",tableonly"
[,prefetch|MRU ,"on"|"off"]]

Only the owner of the table or the system administrator can change the status of the database
objects. At any given time, the cache strategy for the database object can be viewed by issuing
the command:

sp_cachestrategy <databasename>, <tablename>

Another way to look at the strategy is the showplan of the query. The showplan displays the
cache strategy used for every database object:

• Cached (LRU) Buffers reports the number of buffers that used normal cache strategy
and were placed at the MRU end of the cache. This includes all buffers read directly
from disk and placed at the MRU end, and all buffers that were found in cache. At the
completion of the logical I/O, the buffer was placed at the MRU end of the cache.
• Discarded (MRU) Buffers reports the number of buffers that were placed at the wash
marker, using the fetch-and-discard strategy.

It also displays the strategy used for the work tables:

sp_cachestrategy pubs2, authors

object name index name large IO MRU


----------------- ----------------------------- -------- --------
dbo.authors auidind ON ON

(1 row affected)
(return status = 0)
Note You must be in the current database from which you are getting the information.
Otherwise, you will get the message, 'Object must be in the current database.'

Tuning Named Caches

Application tuning is determined by the nature of the application - if it is online transaction


processing, decision support system, or a combination of both. Whatever the kind of
application, tuning needs thorough analysis. Remember that tuning is an iterative process.
Tuning using the cache needs the following:

• Size of the available data cache


• The cache hit ratio
• The transaction types
• The heavily used tables
• The current cache strategy

Once the above are determined, then adjust the cache and the buffer pools. Do a benchmark
test after the change. Tuning using cache is an evolving process and needs to be applied and
reapplied based on the benchmark tests. The Adaptive Server performance can be
dramatically hurt or improved using the named caches and memory pools and attaching the
data objects.
A cache that is not used effectively will hurt the performance. If the available data cache is
used to bind to a database or database object that does not use it, it is a waste of space
allocated from the existing data cache. This means the other data caches will have less
available space for data caching, and there can be more disk I/O. At the same time, if the
named data cache is not bound to the heavily used database objects, then they will hurt the
performance because the data is not available for a longer duration and the disk I/O is
performed.

Adding a pool that is seldom used will also hurt the performance. If a 16K pool is added, the
memory is taken from a 2K pool. This means that less memory is available to the 2K buffer.
You should monitor to determine whether there has been any change in the cache hit ratio of
the 2K buffer pool while simultaneously evaluating the effectiveness of the 16K pool.

A pool that is overused hurts performance. If a 16K pool is configured and all of the queries
use it, I/O rates will increase. The 2K cache will be underused, while pages are rapidly cycled
through the 16K pool. The cache hit ratio in the 16K pool will be very poor. It is important to
balance the pool utilization within a cache because the performance can increase dramatically.
It is also important to make the configuration change and test the performance with similar
workloads.

Strategy for OLTP Type Transactions

OLTP type transactions should try to minimize the contention, as there will be many inserts,
deletes, and updates. They should also try to increase the concurrency, and the physical I/O
should be balanced across all the available devices.

What will be the best choice? The following needs to be in place for OLTP type transactions:

• Named caches - Creating named caches will reduce the I/O needed for accessing the
pages.

Note Often it only requires binding index pages to the named cache. If other objects are also
needed to bind, then it is advisable to bind then separately.

• Configure large I/O buffer size - Setting an appropriate log I/O buffer size will
reduce the number of I/Os required to write to the transaction log. However, the
recommended size is 4K, which is the default.

Note Any new log I/O buffer size should correspond to the existing buffer pool in the log's
cache. When there is a change to the transaction I/O size, Adaptive Server switches the
log I/O to the buffer pool that corresponds to the new I/O size.

• Partition for heap table inserts - Creating partitions on the heap tables will reduce
the contention on the inserts. Since the data is spread on different devices, there is
more than one last page, and the data is spread across various devices. This will also
balance the disk I/O loads.
• Lock promotion threshold - Raising the lock promotion threshold will reduce the
contention because of the unwanted table level locking.
In the following illustration, the required data is read from the disk and placed in the MRU
end of the chain so that data can remain throughout the duration of the query.

Strategy for DSS Type Transactions

In the decision support type transaction, often the priority should be focused on reducing the
physical I/O because there are many table scans and more table joins. For the DSS type of
transaction, the following needs to be focused on in order to get a better performance gain:

• Binding hot tables to named caches - When a table is very heavily used, we need to
make sure that the table is always in the cache. Frequently read and updated objects
can be bound to named caches. Also, the heavily used smaller objects can be bound or
may be just the index pages. Binding the objects to named caches has its own benefits
because it is faster to read from the cache than from the disk. By binding the object to
its own cache, one can increase the cache hit for that object.

Note When an object is bound and if the object is bigger than the named cache, there will still
be performance problems due to cache misses.

• Fetch-and-discard cache strategy - The fetch-and-discard cache strategy enables the


Adaptive Server to place freshly read pages at the buffer wash point instead of the
MRU end of the MRU/LRU chain. This reduces system contention for the MRU end
of the buffer chain, which may reduce the cache misses.

Most OLTP type transactions will use the normal cache strategy where it reads the data and
places it in the MRU end of the chain. However, in the DSS type transaction, it may interfere
with the concurrent transaction and will be a waste of usage because no updates are going to
take place. But, with the fetch-and-discard strategy, Adaptive Server places the pages read
from the disk on the buffer wash point, and they are discarded once the required data is read
by the query. The following illustration shows DSS using the fetch-and-discard strategy.
Relaxed Strategy

An exception to the above cache strategies is called the 'relaxed cache strategy.' Caches
configured with the relaxed LRU replacement policy use the wash point but are not
maintained on an MRU/LRU basis. A special strategy ages out index pages and OAM pages
more slowly than data pages.

These pages are accessed frequently in certain applications, and keeping them in cache can
significantly reduce disk reads. Adaptive Server may choose to use the LRU cache
replacement strategy that does not flush other pages out of the cache with pages that are used
only once for an entire query.

The checkpoint process ensures that if Adaptive Server needs to be restarted, the recovery
process can be completed in a reasonable period of time. When the checkpoint process
estimates that the number of changes to a database will take longer to recover than the
configured value of the recovery interval configuration parameter, it traverses the cache,
writing dirty pages to disk. The housekeeper task writes dirty pages to disk when idle time is
available between user processes.

Performance with Large I/O

Any default cache or named cache can be split into pools that can use large I/O. The default
I/O size is 2K. When a query reads a large amount of data into the cache in a sequential
manner, the pools can be configured for large I/O because Adaptive Server can read up to
eight data pages in a single query. Types of queries that may benefit from the large I/O are:

• Queries that do a table scan and fetch large volumes of data or entire data.
• Queries without a WHERE clause.
• Aggregate queries that use a WHERE clause but do not have an index for the WHERE
clause column.
• Range queries from tables with clustered indexes.
• Queries that form Cartesian product.
• Queries that join tables that fetch huge volumes.
• Text or image type queries.

However, there are exceptions to this. Caches configured with a relaxed LRU replacement
policy are not maintained on an MRU/LRU basis. A special type of strategy, as mentioned
earlier, will age out the index pages and OAM pages more slowly than the data pages.
Adaptive Server may choose to use the LRU cache replacement strategy that does not flush
other pages out of the cache with pages that are used only once for an entire query.
At any given time, the checkpoint looks for recovery of the databases. For example, if the
Adaptive Server needs to be restarted, the recovery should be completed at a reasonable
amount of time. If the checkpoint estimates that it will not be able to recover the data, it
traverses the cache, writing dirty pages to disk. The housekeeper task writes dirty pages to
disk when idle time is available between user processes.

If the goal is to maintain the high cache hit ratio, check the query plan with statistics io turned
on. If the query uses 16K I/O, it reads eight pages with each I/O operation. If the statistics io
reports 50 physical reads, it has read 400 pages. More information on the cache hit ratio can
be found in the sp_sysmon report in the data cache management report.

Cache Configuration Considerations

The following should be used as reasons for naming a cache and binding the data objects:

• To reduce the contention for the spinlock on multiple engine servers.


• To improve the cache hit ratio.
• To reduce the disk I/O.
• To reduce the lock contention.

Always try to gather as much information about the data and do many benchmarks. Then
based on that information, plan for the data caching size. Once the plan is made, identify the
objects for the binding to the named caches. The next step is to implement the change.

The first step in developing a plan for cache is to provide as much memory as possible for the
data cache. In the previous versions of Adaptive Server, data cache gets any leftover memory
once all the other configuration parameters that use Adaptive Server memory are configured.
With the latest version of 12.5, the procedure cache and the data cache can be assigned an
absolute value. The baseline performance parameters can be used to establish the tuning
goals.

Every cache need should be analyzed by the I/O pattern, and the pool needs can be evaluated
by analyzing the query plans and the I/O statistics. There are some decisions that can be made
without any evaluation. For example:

• The size for the tempdb cache


• The size for any log cache and log I/O
• The size for any table or index that needs to be entirely in the data cache

Any size large I/O pool can be added to index or data caches. Once the above are determined
and satisfied, other considerations can be made based on the query plan and I/O pattern.

When considering the performance goals, if the goal is to reduce the spinlock contention, then
increasing the number of cache partitions for heavily used caches may be a better solution.
Also, the high I/O objects can be moved to separate caches to reduce the spinlock. On the
other hand, if the goal is to improve the response time, then make sure the cache hit ratio is
high. This can be achieved by creating caches for the tables and indexes that are intensively
used.
An ideal cache configuration strategy will be to configure caches in proportion to the number
of times the pages in the caches will be accessed by all the queries. The pools within those
caches should be configured in proportion to the number of queries choosing the I/O plan of
that pool's size. This may be an iterative process and these evaluations can be done by
analyzing the query plan and the disk I/O management. See Chapter 15 for sp_sysmon and
how to interpret it for more details.

Chapter 8: Parallel Query Processing


Introduction

This chapter will provide details of using parallel queries for performance and to pinpoint any
issues that may occur when using it. Ensure that you have read the chapter on I/O first to
enable you to understand not only if you will benefit from parallel queries but also whether or
not they cause performance issues.

Why Use Parallel Queries?

In a serial process, when a query issues a read, the process will issue an I/O request and sleep,
waiting for the request to complete. I'll ignore asynchronous prefetch for now. Once the page
has been processed, the process will issue the next I/O and sleep. This continues until the
query completes.

If you monitor the CPU and I/O usage during this time, and they both have low utilization,
then there is a scope for increasing the number of processes to improve throughput. By adding
additional processes, the I/O throughput can increase even with asynchronous prefetch, and
the additional processes will process the I/O quicker. It is a balance between throughput and
resource usage.

Even with a single engine, parallel processing can improve performance as long as the CPU
usage is less than 100 percent.

Parallel Processing Model

When parallel queries are performed, the server will spawn a parent process called the
coordinating process and multiple child processes called worker processes. Each worker
process is equivalent to a user process in look and size. It will scan a page, and any rows that
qualify will be passed to the coordinating process. The coordinating process will control the
worker processes and merge the results into a single result set to send back to the client.

If the query does not have to order the rows, as each row is received from a worker process, it
is sent to the client straight away. Each time the query is run, the order of the rows returned
could be different. This is due to the scheduling of the worker processes and when I/O is
loaded into cache. This only becomes an issue if you use the set rowcount command. If the
query involves sorting or is using an aggregation function, a work table will be used and the
results will only be sent to the client once the worker processes finish.

Due to the way the model is implemented, queries that return large unsorted result sets may
perform slower using parallel processes. This is due to the merging processing when worker
processes send the results to the coordinating process, which in turn sends them straight to the
client.

The coordinating and worker processes are collectively known as a family and are allocated a
family ID for the duration of the job execution. The following example shows the output of
sp_who showing the family ID of 7. For clarity, not all columns are shown.

fid spid status loginname cmd


0 2 sleeping NULL NETWORK HANDLER
0 3 sleeping NULL DEADLOCK TUNE
0 4 sleeping NULL MIRROR HANDLER
0 5 sleeping NULL CHECKPOINT SLEEP
0 6 sleeping NULL HOUSEKEEPER
0 7 running bob select
7 8 running bob WORKER PROCESS
7 9 running bob WORKER PROCESS
7 10 running bob WORKER PROCESS
7 11 running bob WORKER PROCESS
7 12 running bob WORKER PROCESS

In the event that a coordinating thread terminates abnormally, the family ID can be used to
identify orphaned processes.

These processes can use parallel processing:

• Any select statement that scans a table.


• Selects using aggregate functions.
• Select into (the select part, not the insert).
• The outer table of an outer join.
• Multiple tables in nested loop joins.
• The outer block of a subquery.
• Sorting can also be done in parallel, which will benefit create index during the sort-
merge phase, reformatting strategy, sort-merge joins, and selects using UNION,
ORDER BY, and DISTINCT clauses.
• dbcc checkstorage, which will not be discussed here.

Processes will not use parallel processing under these circumstances:

• Any select that returns less than 20 pages.


• For a partition scan, there must be at least the same number of worker processes
available as partitions. You can force a hash scan using the parallel option in the
FROM clause.
• For a partition scan, the largest partition cannot be bigger than two times the average
partition size. You can force a hash scan using the parallel option in the FROM clause.
• A non-partitioned table with a clustered index. You can force a hash scan using the
parallel option in the FROM clause.
• Inserts
• Updates
• Deletes
• Cursors
• Inner blocks of subqueries.
• Inner tables of an outer join.
• Temporary or system tables.
• When the optimizer determines it is better to run serially.
• When there is a lack of resources.

How Parallel Queries Access Objects

When the optimizer costs a parallel query, it is important to realize that it is based on time and
not resources. A parallel query will typically consume more resources than a serial query.

There are two different ways a parallel query can be performed - hash and partition:

• Hash is a mechanism to mathematically predict what pages to process based on a


specific unique value.
• Partition is the mechanism to access pages from partitions to minimize I/O; hence, it
will be the preferred method out of the two.

Data can be accessed in the following ways:

• Hash table scan


• Hash non-clustered index scan
• Hash clustered index scan
• Partition scan
• Partition clustered index scan

Hash Table Scan

When you perform a table scan on an allpages locking table, an I/O is requested for each
page. As each page arrives in the cache, it is scanned until all the pages are read. When
running multiple worker processes, each worker process will request an I/O on every page but
will only scan the data on a page it has been hashed with. The worker processes hash on page
ID.

For example, if you request five worker processes, the first worker process will scan pages 1,
6, 11, etc. The second worker process will scan pages 2, 7, 12, etc., and the third worker
process will scan pages 3, 8, 13, and so forth.

Each worker process has to read every page because it only knows what the next page is by
reading the preceding one. Although this does add some overhead, it will be minimal since all
but one worker process will be using logical I/O. Most of the time, however, the worker
processes will be waiting for physical I/O.

Table scanning a data-only locked table is not the same as an allpages locking table since they
do not have previous and next page pointers. Table scans are done using the OAM pages, so
they are called OAM scans. Worker processes will hash on either extent IDs or allocation
page IDs as determined by the optimizer. This provides better throughput because each
worker process is not reading every page. Worker processes perform less logical I/O on data-
only locking tables.
Hash Non-Clustered Index Scan

Every worker process will read the root, intermediate, and leaf pages before hashing on the
page ID. Again, each process will read every leaf page, but the overhead is minimal.

If the index covers the query, the optmizer may not choose to use a parallel scan.

Hash Clustered Index Scan

For allpages locking tables, the same rules apply to non-clustered indexes, except the data is
the leaf level and the worker thread hashes on the key value.

For data-only locking tables, the index is structurally the same as a non-clustered index and so
the rules above apply.

Partition Scan

When a table is partitioned, each worker process will access a single partition. The number of
worker processes will equal the number of partitions. If there are less worker processes than
partitions, the query is run in serial mode.

If the optimizer decides it does not want to perform a partition scan, it will not consider a hash
scan. This is done to reduce the optimization time. However, it is possible to force a hash scan
on a partitioned table by using the parallel hint parameter in the query.

An example of this is:

select count(*) from table (parallel 10)

Each worker thread identifies its starting position by reading the partition's control page.

This generates less logical I/O than a simple hash scan and is the best for performance,
especially if all the rules for using partitions are applied.

Partition Clustered Index Scan

This method is only available for allpages locking tables and is used on tables that are
partitioned and have a clustered index. The table must maintain the index order so over time
the table's partitions will become skewed.

Each worker process will access one partition and, depending on the key used for the search,
the worker process will either traverse the index to find the starting page or scan the entire
partition.

If a partition does not contain any qualifying rows, the worker process associated with the
partition will terminate.

Configuring ASE for Parallel Queries

The following sp_configure parameters need to be set to enable parallel queries:


• memory per worker process - This is the amount of memory used for
communicating between the worker process and coordinating process. The default is
fine.
• number of worker processes - You need to decide how many worker processes you
require to run simultaneously. Each process takes up around the same space as a user
connection, 158KB with ASE 12.5. Each release appears to extend the size of each
connection.
• max parallel degree - This specifies the number of worker processes per query and
has nothing to do with the number of engines the server is running with. The default
value is 1, which switches off parallel query. Altering this value will cause all query
plans for stored procedures and triggers to be recompiled the next time the query is
executed.
• max scan parallel degree - This value sets the maximum number of worker processes
used for a hashed query on a non-partitioned table or index. If you set this
configuration option to 1, you are disabling hash scans only.

The values above are session level parameters, which means they are free for all. Any query
that comes along which obeys the rules for parallelism can use them. This can exhaust the
number of processes available, and the queries that were intended to run in parallel will
execute in serial.

When it comes to deciding on the actual values, you need to know what queries you want to
run in parallel. When it comes to single table queries, you simply determine the number of
partitions in the table or the number of worker processes per table and then the number of
concurrent queries. Configure the maximum number of worker processes that will not saturate
the system, and then adjust after monitoring the performance.

Different queries will use different numbers of worker processes, and you should monitor
how many each type uses. An example is a nested loop join, where the number of worker
processes will equal the product of two tables.

The other alternative is to only configure parallel processing for creating indexes. Parallel
sorting is discussed later in the chapter.

Controlling the Number of Parallel Queries

One of the biggest problems in the way parallelism was implemented in ASE was that it is not
easy to restrict which queries run in parallel and which ones run in serial.

First, you have to configure max parallel degree and max scan parallel degree.

Any query qualifying for parallelism will run in parallel. So what happens if you do not want
this query to run in parallel? The only way to force the query to run serially is to perform the
following operations.

Before you issue the query, you can run:

• set parallel_degree n, where n is the number of worker processes to use for the entire
query.
• set scan_parallel_degree n, where n is the number of worker processes to use per table.
Setting these to 1 will force the query to run serially, and setting them to 0 will revert to the
server defaults.

You can also add a parallel hint to the query by setting a parameter after the table name in the
FROM clause:

• Parallel n, where n is the number of worker processes for the query.

The problem with doing this is that you need to do it for every query, which can be difficult to
manage. The only other alternative is to run sp_configure, setting the two parallel parameters
during a time when they are most needed.

How Should I Implement Parallel Queries?

At this stage, we understand what parallel processing is and that it will reduce the elapse time
when running queries. The difficulty is implementing a strategy that improves overall
throughput for parallel and serial queries running simultaneously.

Queries running serially spend much of their time waiting for I/O, which is why we require
the parallelism. Once parallelism is set, you remove that bottleneck and it shifts somewhere
else. You will find the shift is toward CPU usage.

The idea is to improve the throughput without waiting for CPU. Once you start waiting, then
you have hit the limit for the query and adding extra worker processes will not improve the
throughput.

Running a query that generates 100% CPU will obviously affect every process trying to
execute, not just the worker processes for that query. This is where it becomes difficult. How
do you set parallelism to a level that suits everyone?

While documentation may state otherwise, I have experienced systems that ran the CPU at
25% without parallel processing and pegged eight CPUs when parallel processing was turned
on. Make sure you system test before implementing in production.

Parallel Query Tests - Single Queries

Below is a table showing a query executed ten times with one to ten processes and the CPU
usage for each one. The example is on a server with three engines. The output shows a select
count(*) from a table for a hash scan where all the data is in cache. We know that I/O
performance will increase, but we must understand the influence of parallel queries and
CPUs.

Number of Processes Elapsed Time Engine 0 Engine 1 Engine 2


1 18 secs 0 0 100
2 18 secs 2.1 3.2 100
3 18 secs 86.7 99.4 6.6
4 16 secs 98.1 100 96.9
5 16 secs 100 73.6 99.4
Number of Processes Elapsed Time Engine 0 Engine 1 Engine 2
6 16 secs 98.8 89.5 100
7 16 secs 99.1 100 100
8 16 secs 97.5 99.4 98.8
9 16 secs 98.2 97.5 99.4
10 16 secs 98.8 98.8 98.8

Looking at the results, we can see that the elapsed time never improved once all the engines
became saturated. We have only gained an improvement of two seconds running in parallel,
but we managed to saturate the entire server.

Next, to determine the effect of a hash scan performing all that additional logical I/O, run a
select count(*) for both partition and hash scans. The results are shown in the following table.

Partition Scan Hash Scan


Number of Worker Elapsed Time Logical I/O Elapsed Time Logical I/O
Processes in Seconds in Seconds
0 1.5 40500 1.5 40500
10 0.7 40503 2.2 405110
20 0.6 40507 4.3 810220
30 0.7 40506 6.5 1215330
40 0.6 40511 8.5 1620440
50 0.7 40511 10.7 2025550

A partition scan will scale very well, but due to the additional logical I/O incurred with a hash
scan, the elapsed time can sometimes exceed that of a serial scan.

Parallel Query Tests - Multiple Queries

Below are tests showing the effect on elapsed time when running a mix of serial and parallel
queries using hash scans. Ten jobs were run simultaneously, each performing a select
count(*). The table was cached so every task was competing for CPU.

Test 1 Ten identical simultaneous queries


Number of worker processes 0 0 0 0 0 0 0 0 0 0
Elapsed time, in seconds 40 46 103 66 117 124 94 76 122 126
Combined elapsed time: 914 seconds

Ten tasks, each performing in serial mode, ran in a combined elapsed time of 914 seconds.

Test 2 Ten identical simultaneous queries


Number of worker processes 5 0 0 0 0 0 0 0 0 0
Elapsed time, in seconds 23 116 120 69 99 115 110 74 128 128
Test 2 Ten identical simultaneous queries
Combined elapsed time: 982 seconds

This time, task one performed a parallel query using five worker processes. The combined
elapsed time increased to 982 seconds, which indicates that even though task one completed
quicker, the extra scheduling and extra CPU used caused the elapsed times of the remaining
tasks to increase.

Test 3 Ten identical simultaneous queries


Number of worker processes 5 0 5 0 5 0 5 0 5 0
Elapsed time, in seconds 79 93 69 116 75 124 82 129 85 117
Combined elapsed time: 969 seconds

This test shows five tasks running in parallel and five tasks running in serial. The total number
of processes has gone from 15 in the last test to 35 in this test. The overall elapsed time is 969
seconds, which is slightly faster than test 2. Even though the parallel tests are faster than the
serial tests, the parallel tests are still slower than the single parallel test from test 2. This
indicates the performance is based on the balance between the speed of parallel queries and
the use of available CPU time.

Test 4 Ten identical simultaneous queries


Number of worker processes 5 5 5 5 5 5 5 5 5 5
Elapsed time, in seconds 118 117 126 126 128 126 124 124 127 124
Combined elapsed time: 1240 seconds

This test shows all ten tasks running in parallel, creating a total of 60 processes. The overall
time is 1240 seconds with individual queries running slower than test 1 in serial mode. This is
due to the server spending more time scheduling tasks than performing meaningful work and
being starved of CPU.

Balancing Throughput

I hope you can see that to get the best throughput you will need to have a clear understanding
of your application and be able to control what runs in serial and what runs in parallel.
Partition scans perform better than hash scans, but do not be put off by them. I found that
running two worker processes per engine yielded the best results.

Insufficient Worker Processes

If there are many simultaneous queries running, you may get in a situation where you have
insufficient worker processes available to run a query. There are two reasons why there could
be insufficient processes to perform a query in parallel:

• All the worker processes are currently being used - If the number of worker processes
configured is currently being used, then for hash scans, the server will run with
whatever is remaining. It may even run serially. For partition scans, the server will run
serially.
• For a partition scan, there are not enough worker processes configured for the number
of partitions in the table - To execute a partition scan, there must be enough worker
processes to scan each partition. If there is not, then ASE will silently switch to serial
mode.

You can force the number of worker processes using the parallel hint at the end of the FROM
clause, but this will force it to use a hash scan.

When there are insufficient worker processes available, ASE will output a warning and create
an adjusted query plan with fewer processes, or it may even run in serial.

To control what ASE does in this situation, there is a 'warning' command to change the
default behavior:

set process_limit_action x

where x is one of three options:

• Quiet - The query will run with whatever worker processes are remaining without
displaying a message.
• Warning - You will get the following message when there are insufficient worker
processes available:

Insufficient worker processes available. Query executed using current available


worker processes. Inform the system administrator (SA) that the value of worker
processes may be too low.

AN ADJUSTED QUERY PLAN IS BEING USED FOR STATEMENT 1 BECAUSE


NOT ENOUGH WORKER PROCESSES ARE CURRENTLY AVAILABLE.

The query will continue with whatever worker processes are remaining.

• Abort - With this option set, the query will abort with the following message:

Msg 11015, Level 16, State 1:

Server 'PRODUK19_SQL', Line 1:

Insufficient worker processes available. To execute, wait until the system is less busy
or set process_limit_action to quiet or warning, or increase the value of worker
processes.

Command has been aborted.

Parallel Sorting

Parallel sorting is one of those really useful performance improvements that is seldom used.
Any command that performs a sort can benefit from parallel sorting. These include:

• Select queries using DISTINCT, UNION, or ORDER BY


• Queries performing a reformatting strategy
• Create index
• Merge joins

Creating a clustered index on a table that has partitions must use a parallel sort.

How a Sort is Performed

The first part of the sort builds a distribution map, which divides the data up so each worker
process knows what data to process.

The worker processes are then split into two types - producers and consumers.

• Producers read data from the input table and copy it to the sort buffers.
• Consumers read data from the sort buffers and sort the rows. If the table is larger than
the sort buffers, intermediate results are written to disk.

Once all the rows are sorted, the coordinating process merges the final results.

Facts about Parallel Sort

A database must have the select into/bulk copy/pllsort database option set before a parallel
sort will be performed. After you create an index, you cannot use dump tran until you dump
the entire database.

It uses a lot of resources when run in parallel, but the performance improvements make it
worthwhile.

There must be a minimum number of worker processes available for a sort to be performed.

The table to be sorted must be at least eight times the size of the available sort buffers, which
is described below.

Configuring the Number of Producers

You cannot actually configure this value. Adaptive Server Enterprise will use one worker
process for non-partitioned tables, or it will use the same number of worker processes as there
are partitions.

Configuring the Number of Consumers

The number of consumers can be altered from the default. By default, it will use the same
number of worker processes for partitioned tables as there are partitions, and for non-
partitioned tables, it will use the same number of worker processes as there are devices on the
target segment.

The target segment can be on any number of devices. Even if it was on one device, with
modern disk arrays, the device could be configured over multiple physical disks.
When using create index, there is a parameter called with consumers = n, where n is the
number of consumers. If the value specified is larger than the available worker processes,
ASE ignores it.

With all other work table sorts, ASE will actually use two devices if the target segment in
tempdb is only on one.

Sort Buffers

Before a parallel sort is performed, there must be adequate sort buffers available to use. You
configure sort buffers using:

sp_configure "number of sort buffers" n

where n is a value between 0 and 32767.

Sort buffers are between 2KB (single-page) and 16KB (eight pages) in size depending on
your configured logical page size. They are used to hold pages read from tables and pages
used in the merge phase. The space for these is taken from the default data cache when the
sort is performed. When the sort completes, the space is made available for data pages.

If another sort is using all the buffers, then the next sort will wait for the buffers to become
available.

Configuring ASE for Optimal Performance

The more cache you have available, the better the performance will be. Work table sorts use
the cache that tempdb is bound to, and create index uses the cache that the table is bound to. If
there is no cache binding, the default data cache is used.

Ensure that there are enough sort buffers. If there are not enough, there will be more physical
I/O and the server will need to perform more merge cycles to complete the sort. If there are
too many, you will be taking up cache buffers that could be used for data pages.

Also, ensure that there are large buffer pools created to speed up the physical I/O.

The configured number of sort buffers must be less than 90% of the default pool in the default
data cache. The limit is only enforced at run time, not when the value is configured.

set sort_resources Command

The set sort_resources command allows you to determine what resources will be used by a
create index without actually performing the create index. Using the output, you can
determine what values can be used for the best performance.

Run the command by typing:

set sort_resources on

The output below shows a create index configured with eight consumers.
The Create Index is done using Parallel Sort
Sort buffer size: 15000
Parallel degree: 50
Number of output devices: 1
Number of producer threads: 1
Number of consumer threads: 8
The distribution map contains 7 element(s) for 8 partitions.

The distribution map, which would follow, shows what key values each worker process will
begin processing from.

Number of sampled records: 7346

Expected Performance

To illustrate the benefits of parallel sorting, I ran a test creating a non-clustered index in serial
and parallel modes.

Elapsed Time Engine 0 Engine 1 Engine 2 Sort Buffers Wait for


Sort
Buffers
Serial 8 min 21 sec 99.8 68.4 0.0 5000
Parallel - 4 4 min 30 sec 97.9 90.2 29.2 5000 28182
Parallel - 8 4 min 22 sec 97.9 23.1 97.2 5000 28368
Parallel - 16 4 min 29 sec 98.7 26.3 91.8 5000 77866
Parallel - 32 4 min 29 sec 97.2 25.4 28.9 5000 114114
Elapsed Time Engine 0 Engine 1 Engine 2 Sort Wait for
Buffers Sort
Buffers
Serial 10 min 13 sec 99.5 0.0 0.8 32576
Parallel - 4 4 min 29 sec 99.2 32.6 98.9 32576 11268
Parallel - 8 4 min 29 sec 31.3 98.8 99.0 32576 15651
Parallel - 16 4 min 26 sec 99.2 30.4 31.6 32576 29476
Parallel - 32 4 min 20 sec 97.1 29.1 27.1 32576 70921

The two runs above show the difference when running with a different number of sort buffers.
Unfortunately, in this case, the results indicate that there was no improvement. sp_sysmon
actually showed that we were waiting for I/O, which is why the CPU percentages are not 100
percent.

It does show that running a parallel create index does run quicker. The reason for this is that
there were more requests for I/O issued per second than with a serial create. Running with
four worker processes managed to hit the I/O limit, which is why the figures are fairly even
when trying different numbers of worker processes.

The Wait for Sort Buffers column only shows that the more you have, the less time the sort
stalls. Again, it was still managing to process faster than retrieving the I/O.
Monitoring Parallel Queries

A big problem with parallel queries is knowing why a particular query runs in serial mode. In
this section, I will discuss the tools used to monitor parallel queries.

showplan

Running showplan will indicate what type of parallel run is being performed. The following
example runs a select count(*) using two worker processes.

QUERY PLAN FOR STATEMENT 1 (at line 1).

Executed in parallel by coordinating process and two worker processes.

The following output shown in bold is specific to parallel processing.

STEP 1
The type of query is SELECT.
Evaluate Ungrouped COUNT AGGREGATE.
Executed in parallel by coordinating process and 2 worker
processes.

FROM TABLE
stevetest
Nested iteration.
Table Scan.
Forward scan.
Positioning at start of table.
Executed in parallel with a 2-way hash scan.
Using I/O Size 16 Kbytes for data pages.
With LRU Buffer Replacement Strategy for data pages.

Parallel result buffer merge.

Trace Flags

There are two trace flags used for checking join strategies that also output parallel information
for queries not performing joins. They are 310 and 317.

Trace flag 310 will output the plans considered for performing a query, and 317 will show
you the plans not considered. Using this, you can determine what type of parallel query would
be chosen, even if it still performs the query in serial mode. To set them, run:

dbcc traceon(3604 ,310,317)

The following example shows a table scan using five partitions with the final plan being the
one it will use.

WORK PLAN (total cost = 172148, order by cost = 0, delproj cost = 0)


varno=0 (stevetest) indexid=0 ()
path=0x830fb800 pathtype=sclause
method=NESTED ITERATION
numthreads = 1
outerrows=1 rows=1000000 joinsel=1.000000 cpages=40507 data_prefetch=YES
data_iosize=16 data_bufreplace=LRU lio=40507 pio=5063
corder=0

NEW PLAN (total cost = 172148):


varno=0 (stevetest) indexid=0 ()
path=0x830fb800 pathtype=sclause
method=NESTED ITERATION
numthreads = 1
outerrows=1 rows=1000000 joinsel=1.000000 cpages=40507 data_prefetch=YES
data_iosize=16 data_bufreplace=LRU lio=40507 pio=5063
corder=0

The following output in bold indicates the plans that the optimizer finally chose to perform
the query.

WORK PLAN (total cost = 34588, order by cost = 0, delproj cost = 0)


varno=0 (stevetest) indexid=0 ()
path=0x830fb800 pathtype=pll-partition
method=NESTED ITERATION
numthreads = 5
outerrows=1 rows=1000000 joinsel=1.000000 cpages=40507 data_prefetch=YES
data_iosize=16 data_bufreplace=LRU lio=8141 pio=1017
corder=0

NEW PLAN (total cost = 34588):


varno=0 (stevetest) indexid=0 ()
path=0x830fb800 pathtype=pll-partition
method=NESTED ITERATION
numthreads = 5
outerrows=1 rows=1000000 joinsel=1.000000 cpages=40507 data_prefetch=YES
data_iosize=16 data_bufreplace=LRU lio=8141 pio=1017
corder=0

TOTAL # PERMUTATIONS: 1

TOTAL # PLANS CONSIDERED: 2

CACHE USED BY THIS PLAN:


CacheID = 0: (2K) 0 (4K) 0 (8K) 0 (16K) 1017

PARALLEL:
min(configured,set) parallel degree = 6
min(configured,set) hash scan parallel degree = 2

FINAL PLAN (total cost = 34588):


varno=0 (stevetest) indexid=0 ()
path=0x830fb800 pathtype=pll-partition
method=NESTED ITERATION
numthreads = 5
outerrows=1 rows=1000000 joinsel=1.000000 cpages=40507 data_prefetch=YES
data_iosize=16 data_bufreplace=LRU lio=8141 pio=1017
corder=0
Note Output from the 310 and 317 trace flags could be huge if there were more tables. Output
size increases as the number of tables in the join increases.

sp_sysmon

The output of sp_sysmon shows if parallel queries are taking place and whether there are
sufficient worker processes to support all the requests. The output has two sections - worker
process management and parallel query management
Worker Process Management

This section shows the number of requests for worker processes issued and those that
terminated, the number of worker processes used, and the memory usage for the worker
processes. Use this section to determine if you have sufficient worker processes to perform all
your queries. Below is a sample of the output:

Worker Process Management


-------------------------
per sec per xact count % of
total
------------ ------------ ---------- ------
----
Worker Process Requests
Requests Granted 0.0 0.3 1
100.0 %
Requests Denied 0.0 0.0 0
0.0 %
-------------------------- ------------ ------------ ----------
Total Requests 0.0 0.3 1

Requests Terminated 0.0 0.0 0


0.0 %

Worker Process Usage


Total Used 0.1 8.3 33
n/a
Max Ever Used During Sample 0.1 8.3 33
n/a

Memory Requests for Worker Processes


Succeeded 1.0 65.8 263
100.0 %
Failed 0.0 0.0 0
0.0 %
-------------------------- ------------ ------------ ----------
Total Requests 1.0 65.8 263

Avg Mem Ever Used by a WP (in bytes) n/a n/a 365.1


n/a

Use the Avg Mem Ever Used by a WP (in bytes) value to determine whether the memory
configured for each worker process is sufficient.

Parallel Query Management

This section reports parallel query execution. Below is a sample of the output:

Parallel Query Management


-------------------------

Parallel Query Usage per sec per xact count % of


total
------------------------- ------------ ------------ ---------- ------
----
Total Parallel Queries 0.0 0.0 0
n/a
WP Adjustments Made
Due to WP Limit
Due to No WPs
Merge Lock Requests per sec per xact count % of
total
------------------------- ------------ ------------ ---------- ------
----
Network Buffer Merge Locks
Granted with no wait 0.0 0.0 0
0.0
Granted after wait 0.0 0.0 0
0.0

Result Buffer Merge Locks


Granted with no wait 0.0 0.0 0
0.0
Granted after wait 0.0 0.0 0
0.0

Work Table Merge Locks


Granted with no wait 0.0 0.0 0
0.0
Granted after wait 0.0 0.0 0
0.0
------------------------- ------------ ------------ ----------
Total # of Requests 0.0 0.0 0

Sort Buffer Waits per sec per xact count % of


total
------------------------- ------------ ------------ ---------- ------
----
Producer Waits 0.0 0.0 0
0.0 %
Consumer Waits 272.8 17730.3 70921
100.0 %
------------------------- ------------ ------------ ----------
Total # of Waits 272.8 17730.3 70921

The Parallel Query Usage value shows the number of queries eligible to run in parallel mode.
If a run-time adjustment has to be made due to a resources limit, this will be shown in one of
the following ways:

• Due to WP Limit - By restricting the number of worker processes by using the set
session option.
• Due to No WPs - A lack of worker processes available at the time of running.

Merge Lock Requests reports contention waiting for locks during the merge phase and is an
indication of the level of parallel activity running. If any of the three are indicated, then
reducing the number of worker processes may prevent unnecessary scheduling.

Sort Buffer Waits indicates a lack of sort buffers. Configure as many as you can within your
memory constraints. Even with the maximum configured, you will get waits.

Chapter 9: Application Design


Any successful application is a combination of many well-integrated components, including
good client design, reliable network with sufficient bandwidth, efficient database design, fast
and reliable server hardware, and minimal resource contention with other interacting
applications. Often, the failure of a project is the result of omitting the steps to assess the
system impact of other applications or different functions within the same application.

Client Design

The user's perception of an application is often governed by how well the user interface is
designed. Consistent and clear screens with adequate help menus and error messages can
often combat the negativity generated by occasional slow response times. Certain design
decisions should be avoided or carefully designed into the application. Often, the client may
be required to load large amounts of data or to write text data. The use of writetext, fast bulk
copy, and select into are non-logged operations in Sybase and can make it impossible to
recover data in the event of system failure. If the application needs to use any of these
features, they should be limited to a non-critical work database that holds transient data and
can be lost during system failure.

Client applications should also avoid long transactions over the network, as this will lock up
resources while packets are being transferred across the network. Client design should never
embed client interaction within a transaction, as this could result in a transaction that is never
completed and would cause a serious impact on the system.

Resource Contention

Contention occurs at the server level and the application level. Server resources are shared
across the application and between applications sharing the same server. These server
resources are then shared between tasks within the application. The DBMS is an application
sharing resources on the server with other applications, so any contention on resources can
seriously affect the overall system performance. If contention occurs on critical components,
then some processes are forced to wait until the resource can be released, and the application
will not have the expected throughput.

The most common server resources that applications contend for are network bandwidth,
physical disk I/O, CPU time, and memory. The most common Sybase server resources
involved in contention are cache, data and index pages, spinlocks, system tables, and tempdb.
All these critical resources should be monitored to determine any bottlenecks and potential
problem areas, as the system scales due to increased usage.

Network bandwidth determines the volume of data that can be passed to and from the server.
If the network becomes saturated with traffic, then the number of corrupted packets,
collisions, and retries increases, causing an increase in response times. Applications should be
designed to minimize the amount of network traffic.

Single updates generated on the client as dynamic SQL and sent as individual DML
statements are the most inefficient method to perform updates to the database. Each DML
statement sent over the network consists of the SQL syntax and data. Using stored procedures
can reduce network traffic; each stored procedure call consists of the data and the procedure
name only. Whenever possible, multiple statements should be batched rather than being sent
individually to the server. An application should only retrieve the data it requires; all
unwanted rows or columns should be eliminated from the result set.
Another common area of inefficiency is when an application requests the same information
multiple times rather than caching the results, or when commonly used data could be
pregenerated for all processes or users to share rather than being regenerated as needed.

Network packet sizes should be large enough for the application's processing requirements.
Large data transfers require a larger network packet size than the standard 512 bytes. This
space needs to be allocated separately, as it does not come from the default memory pool.
Each packet being sent requires CPU and operating system resources, so the ideal packet size
will be the largest without sending a large number of partially full packets. Network packet
sizes can be set for each connection, allowing application functions to be tuned. TCP no delay
should be turned on for TCP networks that are sending large volumes of small packets; this
eliminates the wait time to fill packets but will send partially full packets. Multiple network
listeners may also reduce network bottlenecks. An application should not unnecessarily open
and close connections, as each open and close requires CPU and operating system resources.

Physical disk I/O determines the amount of data that can be retrieved from the disk and
passed back to the server for the applications to use.

In OLTP applications, small page sizes are preferable since the amount of data being modified
is small, usually much smaller than a default page size of 2K, so the amount of unused data is
kept to a minimum. In DSS applications, huge amounts of contiguous data are often required,
usually much larger than the default page size of 2K, so increasing the page size beyond the
default of 2K can bring benefits of reducing the number of disk I/O requests and reducing the
total disk access wait time. Page size is a server-wide setting, so careful selection is required
if multiple applications share the same server. The available settings are 2K, 4K, 8K, and
16K. Larger page sizes are useful for DSS systems with wide fact tables or applications that
store a large amount of text, images, or Java objects.

The physical limitations of the disks control the maximum amount of data that can be
retrieved. Many applications exceed the physical limitations of a single disk and require the
data to be split across multiple disks. This physical placement can seriously affect the overall
application performance and should be determined during the design phase. Due to the
internal architecture of Sybase, it is important that the physical placement of tempdb,
transaction logs, and heavily accessed user tables and non-clustered indexes is carefully
selected. The user tables and indexes can be placed on user-defined segments or partitions,
but as these are just logical representations of the data, their success is dependent upon how
these logical components are mapped to their physical components. Each of these logical
storage components should be placed on separate physical disks to reduce the possibility of
device contention. If an application requires loading large amounts of data to a table or
retrieving large amounts of data from the same table, partitioning may significantly improve
performance.

Partitions in particular are not effective unless they map to separate physical disks. Care needs
to be taken when partitioning tables with parallel options set, as each of the worker processes
running in parallel may create contention on non-partitioned tables of an inner join since each
process will have to scan the inner table. If parallelism is used to increase application
throughput, care should be taken to allocate sufficient parallel resources to cover peak usage.
If the parallel resources are not sized based upon peak usage, then the server will exhaust its
resources during peak usage and attempt the parallel access paths using serial processes; this
will often cause very inefficient data access and cause a severe degradation in application
performance.

These logical storage components should also be split across several controllers to reduce the
possibility of controller contention. Disk hardware is becoming more sophisticated with the
use of fiber optics, disk memory, RAID, and software data buffering. When using RAID and
partitioning, it is unlikely that increasing the number of partitions above the number of
physical disks of the RAID devices will produce any increase in performance. Different disk
hardware configurations will produce different results, but as the fastest components are more
expensive, they should only be used when necessary. System peek usage and growth should
be considered during design and monitored during production. Physical disk I/O can be a
critical factor in the overall performance of any large application.

CPU time determines the amount of mathematical processing and shared resource usage on
the server. All applications and database servers share resources through interaction with the
operating system. Each task on the Sybase server is a lightweight process and shares time
with other tasks within the main process or processes. An application can use task-to-engine
affinity to starve processing time from less critical tasks and to increase the available
processing time to critical tasks. Execution classes can also be used to tune application task
priority on the server so that critical tasks assume a higher priority. When the CPUs become
saturated with requests, they will swap tasks frequently in and out of memory, and system
performance will be poor.

Certain system tuning may be possible to alleviate this problem. Data sorting could be moved
to the middle tier or the client, applications could be moved to another server, the processors
could be upgraded to faster processors, or more processors could be added.

Memory and Cache

Applications that heavily use databases benefit from large amounts of system memory.
Memory processing is much faster than physical disk access, so providing the application
with sufficient memory to cache data pages can cause significant gains in performance. Total
system memory is shared between the operating system, other applications, and Adaptive
Server. The memory allocated to Adaptive Server is allocated to static data structures during
startup depending upon configuration parameters. The remaining memory is then available for
application requests as procedure and data cache.

Stored procedures, triggers, and parallel queries use procedure cache. Data, index, and log
pages are stored in data cache.

Procedure cache holds query trees for stored procedures, triggers, and parallel queries. As a
procedure is requested, its query tree is read from the sysprocedures table, optimized using the
requested parameters, and placed at the MRU end of the cache. If the procedure is requested
again, the procedure cache is searched for an unused query tree; if one is found, it is placed at
the MRU end of cache. Otherwise, a new tree is generated. When plans get to the LRU end of
cache, they are swapped out of cache. The optimal size of procedure cache is application-
dependent. If the procedure cache is too small, then all cache will be in use and an error will
be generated when the next procedure is requested.
Data cache holds the most active pages for system tables, data, index, and log pages. As pages
are requested, they are placed at the MRU end of the data cache. When a page reaches the
LRU end of the data cache, it is swapped out. If the cache hit ratio is low, then the data cache
is too small and should be increased in size. The default data cache is in 2K pages with a
single spinlock. If there is contention on the default data cache spinlock, then the cache should
be split into multiple caches to reduce the spinlock contention. Some application tables or
processes will benefit from a larger data cache than 2K pages. Additional data caches should
be defined and objects should be bound to those caches to improve application performance.
Tables and indexes that are requested frequently can be bound to separate named caches to
improve concurrency.

tempdb

tempdb is a critical shared resource in Adaptive Server since it is often the most dynamic
database on the server and is shared by all applications and processes. It is a server-wide
resource that is used for many internal processing tasks, such as work tables, sorts,
reformatting, and index builds. Users creating work tables can also use tempdb explicitly.

Contention on temporary tables in tempdb is not a problem, as each table is associated with a
user and is not shared. Contention in tempdb occurs on the system tables. If tempdb is heavily
used, then application performance problems can arise. The potential use of tempdb should be
assessed during design. Queries using certain SQL commands such as DISTINCT, ORDER
BY, and GROUP BY all require internal work tables in tempdb. Sizing and placement of
tempdb is particularly important.

If the application requires a very large tempdb, then it should be placed on small, fast disks.
These disks should have no other data on them. On some systems, tempdb will perform much
faster using an operating system file rather than raw devices. This is because an operating
system file uses buffering, which is faster than direct writes to a raw device. Since tempdb is
never recovered, it has no system recovery issues.

Ideally, tempdb should be placed in memory, as this will give significant performance
benefits. Normally, tempdb will use the default data cache, but as this is a shared resource,
tempdb will have contention with other processes; tempdb should be bound to its own data
cache for improved performance.

Application Maintenance

Application maintenance is often a neglected area in application design. Since an application


is used over time, its data structures become fragmented and require periodic maintenance.
System maintenance and database restores should be designed and practiced on a periodic
basis so in the event of system failure, a trusted recovery strategy can be followed. These
maintenance functions should be performed every three to six months so that the procedures
can be refined, practiced, and timed.

Application maintenance can be very intrusive on application performance. Backups, index


rebuilds, reorgs, dbcc checks, and statistics updates are all necessary maintenance tasks that
lock important system resources and will contend for system resources with the application.
Maintenance should be performed during the lowest application activity and should be
planned, scheduled, and monitored.
Reorgs and clustered index rebuilds hold exclusive locks on the table, while backups, dbcc,
non-clustered index rebuilds, and update statistics hold shared locks on the table.

Some maintenance activities, such as large clustered index rebuilds, can be more efficiently
built by unloading the data, sorting the data, loading the data, and building the clustered index
using the sorted data option. Unloading and loading data can be more efficient by using
partitioning, parallelism, and large cache buffer pools. On large clustered indexes, this can
save hours of processing.

On large systems, backups and restores should be done in parallel to keep the maintenance
window as short as possible. Considerable care needs to be taken during the synchronization
of backups if updates occur across databases, as the backups could have inconsistent data
across databases. Backups should be done when there is minimal activity on the server.

Indexes

The majority of the time used to process an application query is retrieving the data pages from
disk. It is therefore crucial to the overall performance of an application that careful selection
of indexes is made to minimize the number of pages read to perform the majority of
application queries. During the application design phase, the most critical and frequently used
functions should be analyzed to determine the data and access paths required to achieve
efficient data page access. Many applications will have many data access points and possible
conflicting index requirements. A matrix of columns used should be made for each table and
the cardinality for these columns should be derived. This matrix should include columns used
for direct access to the table and those used in joins. Columns used in sorting and the type and
order of the sort should also be recorded. From this matrix, it should be possible to build the
best indexes for efficient access. If multiple applications share the same server, their
respective matrices should be combined to produce a server index strategy.

Index columns should be ordered from the most selective to the least selective and from the
most commonly referenced columns to the least referenced columns. For example, if an index
consists of age and sex, then age should be the primary column of the index since it is more
selective than sex. If sex was used as the primary column, it is unlikely to be used, as the
optimizer will expect to retrieve 50 percent of the rows for a query using sex as the SARG. If
age and state are frequently used together to access a table, and state is used some of the time
without age, then the state should be the primary column of the index since it is used more
frequently even though the selectivity between the two columns is similar. If an application
needs to access tables using columns with low selectivity, then partitioning the tables may
help improve response times since the data can be retrieved and merged in parallel, thus
reducing the overall response time.

Sybase supports clustered and non-clustered indexes. Clustered indexes sort the table in the
sequence of the index, the lowest level of the index being the data page, while non-clustered
indexes provide a pointer to the data page and do not sort the table.

It is very important to choose the best-clustered index, as only one can be selected per table. A
clustered index can be chosen on columns used in range queries, in frequently used join
columns, for reducing query sorting, and to evenly spread data during inserts.
Partitioning and Parallelism

Partitioning can reduce both physical and logical resource contention. If a partitioned table is
split across multiple physical disks, then as rows are inserted, fewer inserts are competing for
I/O resources. At the logical level, partitioning allows multiple insert points for a table so
contention at the page level is also reduced. Partitioning can also improve response times for
reads, as it allows data to be read in parallel.

Although partitioning can improve performance for data retrieval, the performance can be
seriously degraded if parallel resources are not correctly allocated. Parallel reads and writes
use worker processes that use resources from a shared pool. So, if there is insufficient parallel
resources for all concurrent requests, then a process that has been optimized to run with
parallel processes will run as a serial process. Often the optimized plan for a partitioned table
will be very different with parallel options set than if parallelism is not available. In such
cases, the system may respond very poorly when serial scans are used instead of parallel
scans.

Deadlocks

Deadlocks are a special type of resource contention and occur when two different processes
hold a data or index page that the other process needs. In this situation, both processes cannot
continue and are in deadlock. The server identifies the deadlock and kills the process that has
accumulated the least amount of CPU time. Since the server has to apply a rollback on the
resources that were updated by the victim prior to the second process continuing, this situation
can be very costly and should be avoided if possible.

Deadlocks have always been a problem in Sybase. This is due to the designers' choice to use
page locking rather than row locking. Page locks were chosen for simplicity and speed, but
the trade-off was increased page contention. The application should be built to minimize page
contention. Several design features can be utilized to minimize page contention; these include
consistent access paths, short transactions, and retry logic. These design features should be
explored prior to changing the locking scheme to data page or row locking as these locking
schemes incur more overheads and require more data maintenance. If contention or deadlocks
are still a problem, then data page locking or row level locking should be used.

Deadlocks can be reduced dramatically by analyzing all the main data update access paths and
determining an access order for the heavily accessed tables. Since most relational databases
exhibit a hierarchical structure, you should process the updates from parent to child. This
process is simple to enforce using stored procedures, but it is difficult to control if the
application generates dynamic SQL or uses a generic code generation tool. Where data access
conflicts exist, a simple lock procedure can be used by one of the processes to reduce the
likelihood of deadlocks. The transaction chosen for the lock should be the one with the
shortest transaction length or the transaction of lowest frequency. This analysis should be
performed within each application and across applications sharing the same database.

Lock promotion from page or row locks to table locks can contribute to deadlocks or
additional resource contention. Tables likely to acquire a large number of page or row locks
should have their lock promotion threshold increased to avoid lock promotion to table locks.
The server should be configured to allow for all concurrent row and page locks.
Application-induced deadlocks occur when the application uses a thread to read data rows and
another thread to update the data rows. The first thread holds a shared lock on the row or page
and waits for the second thread to process successfully, while the second thread tries to
acquire an exclusive lock but has to wait for the first thread to release a shared lock.
Application-induced deadlocks cannot be detected by the server and will remain in deadlock
until one of the threads is killed. Application deadlocks can lead to serious blocking on the
server, forcing other processes to wait for the blocking processes to complete; this can cause
the entire application to stop processing and should be avoided.

Isolation Levels

The SQL standard defines isolation level 3 as the default. This is the most secure for data
consistency but is the most intrusive to concurrency. Adaptive Server uses isolation level 1 as
the default.

Isolation level 0 allows a process to read uncommitted data. Although no locks are acquired,
the data may not be accurate, so isolation level 0 has only a limited use and should be used
with caution. It is unlikely that OLTP applications will be able to use this level of isolation,
but DSS applications are interested in trends rather than specific data and may find that
isolation level 0 can produce a significant improvement in response times.

Isolation level 1 stops a process from reading uncommitted data by holding an exclusive lock
on the page being updated until the end of the transaction, while any shared locks are released
after the page has been read. Isolation level 1 is less intrusive than isolation level 2 or 3. If an
application does not perform multiple passes through the data, then it may provide a
performance improvement to use isolation level 1 instead of isolation level 3. If isolation level
1 is used, then updates should be performed using the timestamp for data verification to
prevent lost updates. If two processes read a row, both will release the shared lock after the
page is read. If the first process updates a row and commits, then the second process could
update the same row without rereading the data and the first update would be lost. This can be
prevented by updating the row with the row timestamp used in the WHERE clause.

Isolation level 2 prevents non-repeatable reads. If a transaction repeats a read while another
process has updated one of the previously selected rows, the repeated read will return
different data from the first. This is prevented in Adaptive Server by processes holding shared
locks until the completion of the transaction.

Isolation level 3 prevents phantom reads. If a transaction repeats a read while another process
has inserted or deleted rows within the result set, the repeated read will return a different
number of rows from the first. This is prevented in Adaptive Server by processes holding
shared locks until the completion of the transaction. Since Adaptive Server uses the same
technique to solve non-repeatable and phantom reads, isolation levels 2 and 3 are not provided
individually.

An application should use the minimum isolation level required for the task; this will
minimize the amount of data and index page contention, improve performance, and reduce the
risk of deadlocks. An application can change the isolation level during processing at the
session level using the set command or at the query level using the AT ISOLATION clause.
Cursors

SQL is a set-based language and allows efficient and simple access to data using its standard
syntax. Occasionally, applications require the processing of individual rows depending upon
the row data; to accommodate this type of processing, the SQL language provides cursors.
Cursors, by default, lock the current row with a shared or update lock depending upon the
type of cursor; if the row is not updated, these locks are released when the next row is read. If
a cursor is used to update rows during a transaction, the exclusive locks are held until the end
of the transaction. These locks are not released during the close of a cursor while the last page
lock is also held after the transaction is completed, unless the cursor is closed. Care should be
taken in an application to correctly complete a transaction and close the cursor to avoid locks
being held too long.

Cursors are not as efficient as set-based updates and will hold locks for a longer period of
time. Cursors should only be used when the application needs to process each row
individually. Otherwise, the standard set-based updates should be used, as this will reduce the
transaction length and resource contention. If isolation level 3 is used in a cursor, then all the
locks are held until the end of the transaction. Due to the relative slowness of cursors, this can
be very intrusive on other processes because of page contention.

Transaction Length

Application data processing and the network should be removed from the transaction if
possible. Many applications have been designed to process each row separately and submit
the updates individually within a single transaction. If many updates make up the transaction,
then the transaction has to be opened and closed on the client and the length of the transaction
could span multiple network packets.

With this type of design, both the row processing and the network call are held within the
transaction, which unnecessarily increases the length and size of the transaction. All data
processing for all the rows should be performed before the transaction is opened, but even
with this design, each row is applied individually and the transaction is still open over many
possible network packets. If these DML updates are performed by dynamic SQL, the SQL
parsing and optimization will also be included within the transaction. With a large number of
updates, such a design could have a lengthy transaction, which could cause considerable
contention problems.

An alternative design could be to load the data on the server in work tables. Once the data is
loaded, a procedure could be executed to load the data as a transaction. This would remove
the network, data transfer, SQL parsing, and optimization from the transaction while also
allowing a mass modification rather than executing each action individually.

Data Validation

Client/middle tier data validation should be performed in a good design, but you should not
overlook the need for server-side data validation. The server is, after all, the repository for the
business data and should ultimately ensure data integrity, as business systems are not closed
systems. Care should be taken for the location of data validation. Application updates will be
simplified if the data validations occur in one central point, either on the database server or in
the middle tier. Only simple edit checks should be performed on the client or each release
may require new client software to be distributed. The types of validation that an application
should enforce are referential, domain, and business.

Referential validation is the enforcement of primary, alternate, and foreign keys. Since all
these validations involve associated data of all other rows in related tables, the server is the
best place to perform all such validations. If the client or middle tier performed these
validations, they would need to have a complete and current copy of the data; this would be
very inefficient and require excessive network traffic. A rule of thumb on the design is that
parent table values have the data validation checks and the child tables have foreign key
checks to the parent. This minimizes data validation duplication. These checks have to be
enforced on insert and update. A common application design error occurs when a surrogate
key is generated on the client or middle tier and is assumed to be unique, and no additional
validation is placed on the server. The surrogate key itself is unique, but the real primary key
is not checked by the application and referential data errors occur.

For example, a primary key on a table could be state and name. As this primary key is large
and would be used in joins, a design decision is made to generate a surrogate unique number
for the row, which is then used as the foreign key by related tables. Any insert or update to the
table needs to check that the surrogate key is unique but more importantly, that the new or
updated row does not have the same state and name value as another row in the table.

Domain validation is the enforcement of data type and range checks for an attribute. As
domain checks do not require any related data from other objects, it can be performed on the
client, in the middle tier, or on the server.

Business validation is application-dependent and can be very complex in nature. These checks
may require related data and should either be in the middle tier or on the database server.
Some business rules are so complex that they are not efficient on the database, whereas other
business rules require so much related data that the only efficient place for the validation is on
the database server. With the new feature in Adaptive Server allowing Java functions within
the database server, it may now be more efficient to perform complex data and business
validations on the database server.

Data Processing

Application processing is often the most critical component in the overall performance of the
application. This is often a balance of data retrieval, data transformation, and data recovery.
Data retrieval is most efficient on the database server, but data transformations are more
efficient within the application. As the amount of data an application requires to process
increases, the amount of data that would need to be retrieved, modified, and stored back to the
database increases. This can be very inefficient, and it may be more efficient to process the
data on the database server. Applications often use the database as a persistent repository and
design inefficient processes to retrieve and modify that data. If an application sources its data
external to the database, it should perform all possible processing on the data prior to storing
it for persistence. Processing data in application memory is much more efficient than storing
and retrieving data from the database server. Adaptive Server now provides the ability to
access external files directly, which may allow more efficient processing of large amounts of
application data.
Normalization and Denormalization

Normalization ensures that the non-key columns of a table are dependent upon the table key
and no other values or tables. The benefits of normalization are that it minimizes duplicate
data, reduces the size of data rows, and keeps each object in its simplest representation.
Normalization allows data maintenance to be simple and efficient, but as it produces
additional tables, it can cause complex data joins when the application retrieves data.

Applications using a fully normalized database may need to join multiple large tables to
perform various tasks. These joins could require a large number of page accesses, either
logical or physical, and this can result in unacceptable response times. If these joins are
performed frequently, the I/O of these joins can seriously affect the overall application
performance. If all other tuning options do not yield sufficient improvement, then
denormalization should be considered.

Denormalization is the process of combining tables, splitting tables, adding derived columns,
or adding redundant columns to reduce the number of page accesses. In cases where multiple
tables are joined to retrieve the required data, the tables can be combined into a single table. If
data is derived from the columns, then these new derived columns can be created in the new
table as data is inserted or modified. If a large table has many columns that are infrequently
used, these columns can be kept in a separate table so that the frequently used columns will
exist in a smaller table, reducing the number of page accesses on that table. This is known as
vertical splitting. If a large table has data that can be separated by an application-processing
column, such as a region, then the table can be split on this column. This is known as
horizontal splitting.

The trade-off for denormalization is that it duplicates data and updates become more complex,
require more data to be modified, and take longer to perform. Since denormalization is
application-specific, it requires review during each application enhancement. If tables are
shared between multiple applications, denormalization may improve the performance of one
application at the expense of another.

HTML and XML

Over the next few years, the focus of software development will be on web-based
applications. As companies rush to implement web solutions to address key business needs,
they are often developing systems that are inefficient and not scalable. Many systems have
already been developed that have required redevelopment within months of being delivered.
The reasons for this are that the components being used for development are young and
immature, the developers and architects are unfamiliar with these components, and design
techniques for such applications are also young and immature.

Client software often requires complex business rules to be included. Hypertext Markup
Language (HTML) is the predominant language of web-based client software. HTML was
originally designed for publishing static documents on the Internet and is inadequate for most
user interfaces. Since HTML displays data formatted on the web server, it requires a request
to the web server for any task to be performed. These requests to the web server do not utilize
the resources available on the client. Many companies are reluctant to allow the downloading
of cookies to the client due to the risk of viruses being loaded onto corporate networks within
these cookies.
Tools for developing user interfaces in HTML are very immature and lack functionality. This
requires complex coding in HTML by the developers to achieve simple tasks. HTML has a
very limited and verbose syntax, which makes the development of user interfaces time-
consuming and tedious. HTML provides a mechanism for defining display-formatting tags
around static text. This format does not allow reuse and is very limited in functionality.
JavaScript has been used to allow some dynamic generation of data on the user interface, but
it is also not reusable and often requires requests to the web server to complete tasks.

HTML communication with the web server is stateless, so that once a client request is
complete, it does not know anything about the server; any further interaction requires
additional requests to the web server to construct the requested data in HTML formatting tags.
Since HTML does not allow the client to retain information, all requests have to be
completely reformatted on the web server.

Since all requests have to be reformatted on the web server, the network volume on heavily
used user-based systems could seriously impact performance. The web server has to maintain
all client session information and reconstruct entire pages on data, so the web server is heavily
burdened with such an inefficient architecture.

Extensible Markup Language (XML) is a new web-based language that attempts to deal with
most of the limitations of HTML. XML allows data to be described and distributed
dynamically using embedded tags. Adaptive Server now provides the ability to process and
store XML data.

The benefits of XML over HTML are that it allows the client to be built dynamically,
requiring fewer trips to the web server for data. The tags can increase the message size
considerably compared to the raw data, but it provides a standard and flexible syntax for
building web-based clients. Just like HTML, the tools for developing user interfaces in XML
are very immature and lack functionality. XML is much more efficient than HTML in total
resource consumption and should be used in preference to HTML unless a very simple and
static web-based client is required.

Application Design Process

The application design process starts with requirement analysis and detail discussions with
user groups. It is advisable to have DBA present during such discussions. Once the
requirements are collected from the user community, try to draw a high-level diagram for data
flow and business process. This will help in giving high-level overview to your developers,
and it is also easier to architect the complete application using graphical tools. A typical
process can be viewed as follows:
As in any well-thought architecture design, the main goal is to satisfy user requirements as
easily and as thoroughly as possible and also to provide ample scope for future growth and
change in the business process.

Chapter 10: T-SQL Programming


Introduction

SQL is a set-based language that is very different from the traditional procedural-based
languages. SQL is much more efficient in executing set-based statements than it is executing
single row updates. The provision of cursors in the SQL standard are to allow row-based
processing but should only be used when each row requires specific processing depending
upon the attribute values of that row. Any tuning or performance tricks are usually DBMS-
specific and are therefore not usually portable via the SQL standard syntax.

Procedures vs. Inline Code

As a general rule, stored procedures are more efficient than inline code if designed correctly.
Stored procedures require fewer bytes to execute, are optimized prior to execution, and are
easier to maintain than inline code. Consider the following SQL code:

select name, last_updated, last_updated_by


from large_table
where id = @param1
and type = @param2
and sequence = @param3

If this SQL was embedded in the client code, the execution call would be approximately 100
bytes.

Create procedure proc1


(
@param1 int,
@param2 int,
@param3 int = 1
)
as
select name, last_updated, last_updated_by
from large_table
where id = @param1
and type = @param2
and sequence = @param3

If this code was a stored procedure, the execution call would be approximately 30 bytes.

exec proc1 @var1, @var2, @var3

Network traffic is therefore minimized by the use of stored procedures.

When a stored procedure is created, it is parsed and stored in the system catalog until
execution. Inline code is parsed each time it is executed. When a stored procedure is first
executed, it is read from the system catalog, optimized, compiled, and loaded into procedure
cache. This version will remain in procedure cache until it is swapped out when not being
used. Any user executing this procedure will get a copy of the version in cache, as long as it is
not in use by another user. While stored procedures are executed from cache, they benefit by
not requiring the optimizing and compilation time that would be needed for each execution of
inline code.

It is good coding practice to call stored procedures by name and not position, as this will
minimize code maintenance. For example, in the following execution, call proc1 will be
executed with @param1 = @var1, @param2 = @var2, and @param3 = @var3; this call is
made by parameter position.

exec proc1 @var1, @var2, @var3

If we needed to modify the procedure to include an extra parameter, and this parameter was
not placed at the end of the parameter list, the above call might not consistently produce the
required result.

create procedure proc1


(
@param1 int,
@param2 int,
@param4 int = 0,
@param3 int = 1
)
as
if (@param4 <> 0)
select name, last_updated, last_updated_by
from large_table
where id = @param1
and type = @param2
order by sequence
else
select name, last_updated, last_updated_by
from large_table
where id = @param1
and type = @param2
and sequence = @param3

If we called the stored procedure by parameter name, as in the following example, the
addition of the extra parameter will have no effect on the execution of the stored procedure
since @param3 will have the value of @var3, as defined by the call.
exec proc1 @param1 = @var1, @param2 = @var2, @param3 = @var3

Avoiding the use of select * will also minimize the amount of code maintenance since the
system is modified. Consider the following table and procedure:

create table tableA


(
col1 int not null,
col2 int not null,
col3 char(15) not null
)

create procedure procA


(
@param1 int
)
as
select *
from large_table
where col1 = @param1

When the procedure is executed, the calling procedure or client code will need to assign the
three column values to an array of two integers and a string. If the table is modified by adding
a column, the procedure will now return four columns, and the client or calling procedure will
fail unless it is modified, even if it does not require the use of the new column. If the
procedure is coded returning column names, as in the following example, then the client or
calling procedure will only require amendment if the new column is required since the
procedure will only return the named columns.

create procedure procA


(
@param1 int
)
as
select col1, col2, col3
from large_table
where col1 = @param1

When designing stored procedures, two important factors should be considered - efficiency
and reuse. Consider the following stored procedure:

create procedure proc1


(
@param1 int,
@param2 int = 0
)
as
if (@param2 = 0)
select @param2 = @param1
select name, description
from large_table
where key between @param1 and @param2

Here the procedure accepts two parameters; param1 is mandatory and param2 is optional. If
the stored procedure is called with param2 set, then the procedure could be optimized to
perform a table scan of large_table. If the procedure called next with just param1, it will
perform a table scan rather than use an index since it has already been optimized to perform a
table scan. Conversely, if the stored procedure is called with only param1 set, the procedure
could be optimized to use an index on large_table. If the procedure is next called with param1
and param2, it will use an index on large_table rather than perform a table scan, which may be
more efficient. This is a common problem with stored procedures that use parameters in
conjunction with the LIKE, <>, and BETWEEN operands. Whenever possible, modularize
stored procedures to reduce unpredictable results and increase the reuse of code. For example,
the code in proc1 above could be written as three modular procedures.

create procedure proc2


(
@param1 int,
@param2 int = 0
)
as
if (@param2 <> 0)
exec proc3 @param1, @param2
else
exec proc4 @param1

Here, proc2 is a wrapper procedure that only executes either proc3 or proc4, depending upon
the value of the parameters. proc3 and proc4 will be optimized and will run more efficiently
than the original proc1.

create procedure proc3


(
@param1 int,
@param2 int
)
as
select name, description
from large_table
where key between @param1 and @param2

create procedure proc4


(
@param1 int
)
as
select name, description
from large_table
where key = @param1

This technique can be used to reduce the problem caused by the use of the LIKE, <>, and
BETWEEN operands. A wrapper procedure could be used to determine whether a range
should perform a table scan or use an index.

create procedure proc1


(
@param1 int,
@param2 int
)
as
if (@param2 - @param1 > 10000)
exec proc2 @param1, @param2
else
exec proc3 @param1, @param2
create procedure proc2
(
@param1 int,
@param2 int
)
as
select name, description
from large_table(0)
where key between @param1 and @param2
create procedure proc3
(
@param1 int,
@param2 int
)
as
select name, description
from large_table
where key between @param1 and @param2

Caution should be taken when forcing the use of indexes or table scans, as using this
technique incorrectly will lead to inefficiency, but it is an alternative to executing or creating
the procedure with recompile.

exec proc1 @param1 = @var1, @param2 = @var2 with recompile

Executing a procedure with recompile will cause the copy of the procedure in the procedure
cache to be optimized and recompiled. This can be controlled on the client when unusual
parameters are used, but since the procedure will remain in cache until swapped out, it may be
executed inefficiently by another user.

create procedure proc1


(
@param1 int,
@param2 int
)
as
select name, description
from large_table
where key between @param1 and @param2
with recompile

Creating a procedure with recompile will force a new optimization and compilation each time
the procedure is executed. This is the least efficient execution of a stored procedure and
should only be used if all other options produce unacceptable and unpredictable response
times.

Until version 12.5, stored procedures were limited to 255 parameters. A technique that I have
used to allow more than 255 data elements is to pack values using delimiters into a set of
varchar (255) parameter columns.

create procedure proc1


(
@param1 varchar(255),
@param2 varchar(255) = null,
@param3 varchar(255) = null,
@param4 varchar(255) = null,
@param5 varchar(255) = null,
@param6 varchar(255) = null,
@param7 varchar(255) = null,
@param8 varchar(255) = null,
@param9 varchar(255) = null,
@param10 varchar(255) = null
)
as ...

For example, if we needed to pass proc1 between one and several hundred numbers, we could
pack these numbers into the varchar (255) parameters using a delimiter of ~. Our parameter
would then look something like '12~131~2455~1222~21~.' If each integer on average
consumes five bytes, each parameter would allow 255/5 = 51 integers, with ten parameter
columns. This would allow 510 integers, twice as many as would be possible using 255
parameters. All that is required is a character array parsing routine which would return the
numbers. I have written such a parser as a system procedure using SQL, but in 12.5 this could
also be written as a Java user function. This technique is still a valid alternative to hundreds of
parameters in 12.5, especially if most of the parameters are optional, as this technique may
allow for less and simplified code without the need to check and manipulate hundreds of
parameters.

Procedure Return Codes and Error Handling

Good error handling is often neglected. Every executable line of code can produce an error.
By default, each DML statement in Sybase is a transaction. Therefore, it is important that
transactions and errors are handled correctly. For example, if we have two related inserts, as
in the following procedure, if the second insert fails the first insert will remain.

create procedure proc1


(
@param1 int,
@param2 int,
@param3 char(50),
@param4 char(50)
)
as
Insert Table1(col1, col2)
Values (@param1, @param3)
Insert Table2(col1, col2, col3)
Values (@param1, @param2, @param4)

The transaction could be handled from the client or within the code. Care should be taken
when controlling the transaction over a network, as this will tend to increase the length of the
transaction.

The above procedure could be rewritten to include transaction and error handling.

create procedure proc1


(
@param1 int,
@param2 int,
@param3 char(50),
@param4 char(50)
)
as
declare
@error int
begin transaction
Insert Table1(col1, col2)
Values (@param1, @param3)
select @error = @@error
IF (@error != 0)
Begin
Rollback transaction
raiserror 99999 "Error inserting to Table1"
Return 2
End
Insert Table2(col1, col2, col3)
Values (@param1, @param2, @param4)
select @error = @@error
IF (@error != 0)
Begin
Rollback transaction
raiserror 99999 "Error inserting to Table2"
Return 2
End
commit transaction

It is good practice to allow for warnings and errors using the standard return codes from
procedures. I use a return code of 1 for warnings and 2 for errors. This allows the calling
procedure the flexibility to handle errors and warnings separately. I tend to use a return code
of 1 for selects that return no rows, as shown in the procedure below.

create procedure proc1


(
@param1 int
)
as
declare
@error int,
@rowcnt int
select name
from table
where type = @param1
select @error = @@error, @rowcnt = @@rowcount
/*-----------------*/
/* Check for error */
/*-----------------*/
IF (@error != 0)
begin
raiserror 99999 "Error returned from select"
return 2
end
/*-----------------*/
/* Check for rows */
/*-----------------*/
IF (@rowcnt !> 0)
begin
raiserror 99999 "No rows found"
return 1
end
Triggers

Triggers are very important because they can be used to enforce referential integrity and
business rules. Since they use the inserted and deleted special tables, which are not indexed
and built from the transaction log, they can seriously affect performance if coded poorly.

Over the years, I have used some simple coding techniques to keep triggers as efficient and
flexible as possible.

Since any syntactically correct SQL statements will fire the respective trigger, it is important
that the first statement in any trigger is a check for any rows being updated and to exit the
trigger if no rows are being modified. For example, the following SQL statement will fire an
update trigger on table1 if one exists.

Update table1
Set col2 = @var2
Where col1 = @var1
col1 col2
Row 1 1 Desc1
Row 2 2 Desc2
Row 3 3 Desc3
Row 4 4 Desc4
……..
Row 1000 1000 Desc1000

If the value of @var1 is 0, then the update trigger will fire, but the value of @@rowcount will
be 0. To avoid unnecessary trigger processing, the following line should be the first statement
in any trigger:

if (@@rowcount = 0)
return

Another useful line of code to add to the top of a trigger is the following:

if (@login = 'trigger_bypass')
return

Set up a login with sa_role here; I have called it 'trigger_bypass.' If you need to do any data
maintenance and do not want the triggers to fire, then you can use this login for the updates
since this line of code will cause the trigger to exit when called with this login.

Prior to 12.5, this could only be achieved by dropping the triggers, updating the data, and
recreating the triggers. This is a process that is prone to errors that can result in data
corruptions if triggers are not rebuilt. In 12.5, there is a no execution option that can be set on
a trigger. Although this is much safer than removing the object, great care is still required to
ensure that all the triggers that have this option set also have it removed after the completion
of any data maintenance. I still favor the use of the special login to bypass the execution of a
trigger; it is simple, efficient, and safe.
Before writing a trigger, certain requirements need to be considered that will determine how
the trigger should be built. The most important decision is whether the trigger will be coded
for single or multiple row changes. This may seem trivial, but triggers being coded for single
row updates but allowing multiple row amendments cause many data errors. If a trigger is
coded to allow single row updates only, it should have a check at the start to reject multiple
row amendments, as in the following example:

select @inserted_count = count(*) from inserted


/* Allow single row updates only */
if (@inserted_count > 1)
begin
raiserror 99999 "Update to only one row at a time"
rollback trigger
return
end

I use counts in the triggers to check that all updates get correctly applied to allow rows. In the
following example, each row updated should have a matching parent row. If the counts do not
match, the trigger is aborted and the statement causing the trigger to fire is rolled back.

select @inserted_count = count(*) from inserted


/* Get count of matching parent rows */
select @process_count = count(*)
from inserted, parent_table
where inserted.key = parent_table.key
/* Check counts */
if (@inserted_count <> @process_count)
begin
raiserror 99999 "Parent row missing"
rollback trigger
return
end

The complexity of a trigger can often determine whether multiple statements can be handled
within a single trigger. In general terms, it is usually a trade-off in performance versus code
duplication and maintenance. I favor the coding of separate triggers for each action, as it tends
to be simpler to code and is more efficient than combining delete, insert, and update triggers
into a single trigger.

Another recommendation is to only put cascade logic in the immediate parent trigger rather
than in all triggers. For example, let's consider three tables - tableA, tableB, and tableC. If
tableA is the parent of tableB and tableB is the parent of tableC, then we could code a cascade
as defined in the following:

Create trigger tableA_cascade


On tableA
For delete
As
Delete tableB
from deleted, tableB
Where deleted.key1 = tableB.key1
Delete tableC
from deleted, tableC
Where deleted.key1 = tableC.key1

Create trigger tableB_cascade


On tableB
For delete
As
Delete tableC
from deleted, tableC
Where deleted.key1 = tableC.key1

The cascade delete of tableC in the delete trigger on tableA is unnecessary and inefficient.
The cascade delete is more efficiently coded and maintained if tableA cascades to tableB and
tableB cascades to tableC, as shown in the following:

Create trigger tableA_cascade


On tableA
For delete
As
Delete tableB
from deleted, tableB
Where deleted.key1 = tableB.key1

Create trigger tableB_cascade


On tableB
For delete
As
Delete tableC
from deleted, tableC
Where deleted.key1 = tableC.key1

Characteristic Functions

An often forgotten or neglected technique for speeding up queries is the use of characteristic
functions. Characteristic functions can be used to obtain results in one pass of the data instead
of several passes using traditional SQL. Characteristic functions are useful when the required
results include a column whose value is required depending on the characteristic value in
another column. Consider the following table:

create table char1


(
regiond char(10) not null,
amount int not null,
period int not null,
year int not null,
periodd char(10) not null,
yeard char(10) not null
)
region amount period year periodd yeard
Row 1 Reg1 500 1 1999 ‘Jan' ‘1999'
Row 2 Reg2 400 1 1999 ‘Jan' ‘1999'
Row 3 Reg3 300 1 1999 ‘Jan' ‘1999'
Row 4 Reg4 600 1 1999 ‘Jan' ‘1999'
Row 5 Reg5 100 1 1999 ‘Jan' ‘1999'
Row 6 Reg1 250 1 1999 ‘Jan' ‘1999'
Row 7 Reg5 150 1 1999 ‘Jan' ‘1999'
Row 8 Reg1 500 2 1999 ‘Feb' ‘1999'
Row 9 Reg2 400 2 1999 ‘Feb' ‘1999'
Row 10 Reg3 300 2 1999 ‘Feb' ‘1999'
Row 11 Reg4 600 2 1999 ‘Feb' ‘1999'
Row 12 Reg5 100 2 1999 ‘Feb' ‘1999'
Row 13 Reg1 350 2 1999 ‘Feb' ‘1999'
Row 14 Reg4 250 2 1999 ‘Feb' ‘1999'
………

If we wanted to have a total of all the amounts in each period for each region, traditionally
these results could be obtained in several different ways.

select year, period, region, sum(amount)


from char1
group by year, period, region

This would give us the following results:

year period region


Row 1 1999 1 Reg1 750
Row 2 1999 1 Reg2 400
Row 3 1999 1 Reg3 300
Row 4 1999 1 Reg4 600
Row 5 1999 1 Reg5 250
Row 6 1999 2 Reg1 850
Row 7 1999 2 Reg2 400
Row 8 1999 2 Reg3 300
Row 9 1999 2 Reg4 850
Row 10 1999 2 Reg5 100
……

The result set returns a large number of rows and needs further manipulation to represent the
answer to the original question. These results could be further refined on the client or on the
server by taking these results and breaking them down by period.

select year, period, region, amount = sum(amount)


into #temp1
from char1
group by year, period, region

select year, region,


'Jan' = sum(amount),
'Feb' = 0,
'Mar' = 0,
'Apr' = 0,
'May' = 0,
'Jun' = 0,
'Jul' = 0,
'Aug' = 0,
'Sep' = 0,
'Oct' = 0,
'Nov' = 0,
'Dec' = 0
into #temp2
from #temp1
where period = 1
group by year, period, region

update #temp2
set Feb = amount
from #temp1
where #temp1.period = 2
and #temp2.year = #temp1.year
and #temp2.region = #temp1.region

update #temp2
set Mar = amount
from #temp1
where #temp1.period = 3
and #temp2.year = #temp1.year
and #temp2.region = #temp1.region

... ... ...

Here we would need an update for each period to obtain the required result set. This is not
very efficient. However, we can achieve the required result set by the use of a characteristic
function.

select year, region,


"Jan" = sum(amount*(1-abs(sign(period-1)))),
"Feb" = sum(amount*(1-abs(sign(period-2)))),
"Mar" = sum(amount*(1-abs(sign(period-3)))),
"Apr" = sum(amount*(1-abs(sign(period-4)))),
"May" = sum(amount*(1-abs(sign(period-5)))),
"Jun" = sum(amount*(1-abs(sign(period-6)))),
"Jul" = sum(amount*(1-abs(sign(period-7)))),
"Aug" = sum(amount*(1-abs(sign(period-8)))),
"Sep" = sum(amount*(1-abs(sign(period-9)))),
"Oct" = sum(amount*(1-abs(sign(period-10)))),
"Nov" = sum(amount*(1-abs(sign(period-11)))),
"Dec" = sum(amount*(1-abs(sign(period-12))))
from char1
group by year, region

Period-1 will give 0 for period = 1 and > 0 for all other values of period. Sign(period-1)
evaluates to 0 for period = 1 and +1 for all other > 0 values for period.

Abs(sign(period-1) evaluates to 0 for period = 1 and +1 for all other > 0 values for period; this
is included for completeness for situations in which periods may have negative numbers.

So the expression 1-abs(sign(period-1) evaluates to 1 for period = 1 and 0 for all other > 0
values for period. The result of this SQL expression is to sum all the amounts into the periods
they match. This query will return the required results in one pass of the data. This technique
is not limited to numeric category values. We could rewrite the query to use string categories
for the period:

select year, region,


"Jan" = sum(amount*charindex(periodd,"Jan ")),
"Feb" = sum(amount*charindex(periodd,"Feb ")),
"Mar" = sum(amount*charindex(periodd,"Mar ")),
"Apr" = sum(amount*charindex(periodd,"Apr ")),
"May" = sum(amount*charindex(periodd,"May ")),
"Jun" = sum(amount*charindex(periodd,"Jun ")),
"Jul" = sum(amount*charindex(periodd,"Jul ")),
"Aug" = sum(amount*charindex(periodd,"Aug ")),
"Sep" = sum(amount*charindex(periodd,"Sep ")),
"Oct" = sum(amount*charindex(periodd,"Oct ")),
"Nov" = sum(amount*charindex(periodd,"Nov ")),
"Dec" = sum(amount*charindex(periodd,"Dec "))
from char1
group by year, region

The expression charindex(periodd,"Jan ") will give 1 for periodd = "Jan " and 0 for all other
values of periodd. This query will also return the required results in one pass of the data.

Subqueries

Subqueries can be correlated or non-correlated. Care needs to be taken when choosing the
inner and outer tables and the type of subquery. Non-correlated subqueries are generally
thought of as being more efficient than correlated subqueries, but this isn't necessarily true.
Consider the following tables:

create table tableA


(
col1 int not null,
col2 int not null
)

create table tableB


(
col1 int not null,
col2 int not null,
col3 int not null,
col4 smallint not null
)

TableA has 875 rows and tableB has 32646 rows.

The following non-correlated subquery:

set showplan on
go
set statistics io on
select col1, col2
from tableA tba1
where tba1.col1 in
( select tbb.col1
from tableA tba2, tableB tbb
where tbb.col2 = 101
and tba2.col2 = tbb.col2 )
set statistics io off
set showplan off
go

yields 30 rows and the following work:

Table: tableA scan count 2700, logical reads: (regular=5580 apf=0


total=5580), physical reads: (regular=0 apf=0 total=0), apf IOs used=0
Table: tableB scan count 1, logical reads: (regular=253 apf=0 total=253),
physical reads: (regular=0 apf=0 total=0), apf IOs used=0
Table: tableA scan count 90, logical reads: (regular=990 apf=0 total=990),
physical reads: (regular=0 apf=0 total=0), apf IOs used=0
Table: Work table1 scan count 0, logical reads: (regular=2756 apf=0
total=2756), physical reads: (regular=0 apf=0 total=0), apf IOs used=0

While this non-correlated subquery:

set showplan on
go
set statistics io on

select col1, col2


from tableA tba1
where tba1.col1 in
( select tbb.col1
from tableB tbb
where tbb.col2 = 101
and tba1.col2 = tbb.col2 )
set statistics io off
set showplan off
go

yields 30 rows and the following work

Table: tableA scan count 1, logical reads: (regular=11 apf=0 total=11),


physical reads: (regular=0 apf=0 total=0), apf IOs used=0
Table: tableB scan count 30, logical reads: (regular=90 apf=0 total=90),
physical reads: (regular=0 apf=0 total=0), apf IOs used=0

The correlated subquery could be flattened as a join, as follows:

set showplan on
go
set statistics io on
select rsm.reportSpecificationID, rsm.marketID
from ReportSpecMarket rsm, ReportSpecStation rss
where rsm.reportSpecificationID = rss.reportSpecificationID
and rss.marketID = 101
and rsm.marketID = rss.marketID
select @@rowcount
set statistics io off
set showplan off
go

yielding 30 rows and the following work:

Table: tableA scan count 1, logical reads: (regular=11 apf=0 total=11),


physical reads: (regular=0 apf=0 total=0), apf IOs used=0
Table: tableB scan count 30, logical reads: (regular=90 apf=0 total=90),
physical reads: (regular=0 apf=0 total=0), apf IOs used=0

The non-correlated subquery could be rewritten with the inner/outer tables reversed, causing a
large increase in the amount of work required to satisfy the query:

set showplan on
go
set statistics io on
select col1, col2
from tableB tbb1
where tbb1.col1 in
( select tbb2.col1
from tableB tbb2, tableA tba
where tbb2.col2 = 101
and tba.col2 = tbb2.col2 )
set statistics io off
set showplan off
go

yielding 30 rows and the following work:

Table: ReportSpecStation scan count 2700, logical reads: (regular=8100


apf=0 total=8100), physical reads: (regular=0 apf=0 total=0), apf IOs
used=0
Table: ReportSpecStation scan count 90, logical reads: (regular=7590 apf=0
total=7590), physical reads: (regular=0 apf=0 total=0), apf IOs used=0
Table: ReportSpecMarket scan count 1, logical reads: (regular=11 apf=0
total=11), physical reads: (regular=0 apf=0 total=0), apf IOs used=0
Table: Work table1 scan count 0, logical reads: (regular=8252 apf=0
total=8252), physical reads: (regular=0 apf=0 total=0), apf IOs used=0

Simple correlated subqueries tend to be more efficient than non-correlated subqueries.


Writing subqueries in different ways can lead to large changes in the amount of work, and
careful thought and experimentation can lead to large performance gains. The best method
will be dependent upon the available indexes, the number of rows, and the cardinality of the
attributes.

Transaction Nesting Levels and Modes

Sybase transactions are unchained by default. The SQL standard is chained. The SQL server
supports the concept of transaction nesting, but only the highest level begin, commit, or
rollback controls the actual transaction.

Care should be taken after a rollback since a rollback will behave differently, depending upon
whether it is invoked from a trigger or procedure. A rollback transaction from a trigger will
roll back all statements in the current batch and will not execute any further statements in the
batch. It will, however, execute the next sequential statement in the trigger. A rollback trigger
will roll back all statements in the current batch and will not execute any further statements in
the batch. It will not, however, execute the next sequential statement in the trigger. A rollback
transaction from a procedure will roll back all statements in the current batch and will execute
the next sequential statement in the batch. Clearly, care needs to be taken when nesting
transactions across triggers and procedures since the transaction can terminate with different
results depending on the source of the error.
Java Functions

SQL is predominately a medium for storing and retrieving data and was never intended as an
extensive data manipulating language. The addition of several system functions has added
flexibility to the standard set of SQL operands to allow more extensive data manipulations.
The addition of Java in the database will allow more extensive data manipulations to be
possible without the need for an external client program to extract, manipulate, and save the
data back to the database. Java functions can be used to supplement the standard system
functions and allow business rules and user-defined datatypes to be more easily and
efficiently applied.

Chapter 11: Optimizing Stored Procedures


Introduction

Recently, one of my developers created a stored procedure that took over seven minutes to
run. I set statistics io on (covered later in the chapter) and ran the stored procedure. What I
noticed was a very high read from table X. Looking at the query, I saw a need for an
additional index on table X and added one for the appropriate column. After adding the index
to the table, I reran the query, which completed in less than 20 seconds. Optimizing
procedures can be as easy as that.

Where Do I Start?

1. Establish a proper testing environment.


2. Plan your stored procedures correctly.
3. Review testing objectives.
4. Debug the stored procedure.
5. Measure the performance of the stored procedure.
6. Make changes to the stored procedure and resume testing.
7. Make changes to the database if necessary.

Optimizing Stored Procedures

Stored procedures, triggers, and other compiled objects require more memory to run in ASE
12.5 than in older versions. The memory required to run a stored procedure increased by 20
percent between versions 10.x and 11.5. Adaptive Server 12.5 needs approximately 4 percent
more procedure cache than 11.5 for the server to maintain the same performance.

According to some publications and experts in the field, 75 to 85 percent of performance


gains on databases come from making improvements in the Transact-SQL code, rather than
configuration tuning. This goes hand in hand with stored procedure optimization since poorly
written stored procedures can bring a database server to a crawl .

The advantages of stored procedures are:


• A stored procedure is a precompiled execution plan that can be executed many times
by many users. A precompiled plan can be tuned and optimized extensively compared
to user ad hoc queries.
• A stored procedure is a reusable object that can be shared by other procedures and
users.
• Stored procedures reduce operator errors.
• Stored procedures can be used to implement object level security. Permissions can be
granted exclusively to the stored procedures and not the actual data tables or views,
ensuring all access is controlled.
• Stored procedures assist in enforcing consistency.
• Stored procedures reduce network traffic and alleviate some of the load on the server.
The database server does not have to spend resources and time recompiling each user
query, and instead it uses precompiled plans.
• Stored procedures reduce application maintenance since changes to code are done in
one central location. It is easier and preferable to change the code in one place, the
stored procedure, than in multiple copies of application code.
• Stored procedures are helpful in ensuring referential integrity, which may not be
achieved by ad hoc queries.

Storing and Executing Procedures

When you create a stored procedure, the server stores the text of the procedure in the
syscomments table of the database at hand. In addition, it builds and stores the query tree, a
normalized form of the procedure, in the sysprocedures table. The server uses the query tree
to create a query plan, which it uses to execute the procedure.

Building the Query Tree: Resolution

The process of building a query tree is called resolution. This process parses the SQL
statements into a more efficient format and resolves all objects involved in their internal
representations to provide an optimized data path to be used during the execution of a
procedure; table names are resolved into their object IDs and column names are resolved into
their column IDs.

Building the Query Plan: Compilation

The process of building a query plan in ASE is called compilation. Adaptive Server builds the
query plan during the first execution of a stored procedure. When the procedure executes,
Adaptive Server reads the corresponding query tree from the sysprocedures table and loads it
into the procedure cache. The server then creates a query plan and places it in the procedure
cache. The query plan is the optimized data access path that Adaptive Server uses to execute
the procedure.

Adaptive Server determines the optimal data access path and builds the query plan based on
the following information:

• The SQL stored in the query tree


• Statistics for each table, index, and available inner columns referenced in the
procedure
• The values of any parameters passed to the procedure on the first execution
Since query plans are held only in procedure cache and not on disk, they must be rebuilt if
you restart Adaptive Server after the procedure is executed.

Multiple Users and the Query Plan

Stored procedures are reusable, not reentrant. This simply means that only one user at a time
can execute a given copy of a procedure's query plan. If two or more users try to execute the
same procedure at the same time, Adaptive Server creates an additional query plan (not a
copy of the first), which is based on the parameters used in the second (or later) execution.
When a user finishes using the procedure, the query plan is available in cache for reuse by
anyone with execute permissions.

If two users run a stored procedure at the same time, and the server must generate a second
query plan for a stored procedure, there is no guarantee that it is the same as the first one. If
you pass a different set of parameters to create the second invocation, the query plan may be
different.

Also, a new query plan for a given procedure may be different from an older query plan if you
make changes to the database, such as adding an index to a referenced table, adding inner
column statistics, or updating the statistics after the older query plan is generated. Adding an
index or updating statistics does not force recompilation, but the newly compiled query plan
may be different from the old one since it takes into account this new information. The
additional query plans, which stay in procedure cache until they are swapped out, could cause
one execution to run very differently from another, although the returned results are the same.

Resolving Execution Plan Differences

The database administrator or user has no control over or knowledge of which execution plans
are in cache and, therefore, which execution plan is used for a given execution. This may
explain unexpectedly different execution times for the same procedure given the same data
and parameters. If you suspect this is the situation in your case, drop and recreate the
procedure or procedures. This will cause all existing plans to be flushed out of cache.

To ensure that you always get your own plan, you can use exec with recompile or create with
recompile. Creating a procedure with the with recompile option decreases performance
because every execution causes a compilation. Use this option only if you know for sure that
you need to recompile the stored procedure at every execution.

Procedure Recompilation

Since the process of building a query plan is called compilation, recompilation is the creation
of a new query plan from the existing query tree. Recompilation automatically takes place
whenever one of the following events occur:

• The procedure is loaded from disk to the procedure cache.


• You drop an index on any table referred to in the procedure.
• All copies of the execution plan in cache are currently in use and another user wants to
execute the procedure.
• You execute a procedure using the with recompile option.
• A database administrator flags a table with the sp_recompile stored procedure. This
causes Adaptive Server to re-resolve and then recompile any procedures or triggers
that access that table when they are next executed.

Dropping an index or a table referenced by a query causes Adaptive Server to mark the
affected procedure as needing re-resolution and recompilation at execution time.

Neither the update statistics nor the create index command causes an automatic recompilation
of stored procedures.

Procedure Re-resolution

Like recompilation, re-resolution causes the generation of a new plan. In addition, re-
resolution updates the existing query tree in the sysprocedures table.

Re-resolution occurs when one of the tables changes in such a way that the query tree stored
in the sysprocedures table may be invalid. The datatypes, column offsets, object IDs, or other
parts of the table may have changed. In this case, Adaptive Server must rebuild some parts of
the query tree.

Adaptive Server re-resolves procedures after you do any of the following:

• Execute load database on the database containing the procedure.


• Execute load database on a database in which a referenced table resides.
• Drop and re-create any table used or referenced by the procedure.
• Bind or unbind a default or rule to a table referred to by a query in the procedure.

To curtail the growth of query trees and plans, periodically drop and recreate all stored
procedures and triggers.

The Planning Phase

You should ask the following questions when writing a stored procedure:

• What are the objectives of the stored procedure?


• What are the business requirements?
• Are there any performance targets?
• Are there any constraints due to database design?

When answering these questions, try to be as objective as possible. The case might be that
because of a deficient database design, you will not be able to meet your performance targets.
Study and review the above questions before diving headfirst into an empty pool.

Testing and Debugging Stored Procedures

• Check to see that the stored procedure works - Before any performance testing
begins, and after you've created your stored procedure, a big question hangs in the air:
Does it work? Run your stored procedure on a testing database that contains a minimal
amount of data. Once the stored procedure compiles and returns correct results, you
can continue to the next step.
• Use a small subset of data - When testing a stored procedure, it is important to work
with a small subset of data. If the stored procedure is supposed to return thousands of
rows of data, and you have yet to optimize your procedure, you may be sitting and
waiting for results. Before testing the performance of a stored procedure, and while
testing it for business logic and requirements, it is preferable to use as small a data set
as possible without compromising your testing.
• Check and test if all logic performs correctly - When checking the stored procedure,
make sure it does not simply return rows. The fact that the stored procedure works is
great, of course, but is it returning the right results? Check the logic of the stored
procedure and the result set, and make sure they are identical. You should never start
testing for performance until you are sure every part of the stored procedure's logic
works correctly every time.
• Check on a larger subset of data - Once you have verified that the stored procedure
works correctly and is in fact returning logical results, you should move to test it on a
larger subset of data. Again, if you are expecting thousands of rows returned, you
should use an intermediary step to ensure that the procedure is returning the right
results and works in an efficient manner on a slightly larger subset of data. If you
notice performance problems, this would be the time to tune it. Using this intermediate
step can save time and frustration and discover performance problems before you have
to sit and wait for results to come back.

Debugging Techniques

• Make sure there is only one version of the stored procedure in the database or in the
environment where the stored procedure is located.
o No spaghetti code. Simplify the procedure as much as possible.
o Insert as many comments as possible.
o Check for common errors, especially typos, and missing joins, parameters,
table aliases, etc. One major factor in poor performance is datatype mismatch.
• Divide and isolate the various blocks of code.
o Use a debugging tool or simply run individual queries from the stored
procedure.
o Verify the variable or parameter value during run time.
• Check recent changes.
o Return to the previous version of the code and compare the differences. Make
sure the new code is compatible with the older code and supports all business
logic.
• If you encounter an error, try to reproduce it in as many ways as possible. This will
help you pinpoint the cause of the error.
• Brainstorm along with developers or others who understand the database or the stored
procedure logic, application logic, etc., for possible causes.

Using SARGs (Search Arguments)

A search argument (SARG) is used to construct the access path to one or more data rows
when a column in an SQL WHERE clause matches the first column in an index. The index is
used to locate and retrieve rows in the table that match the SARG and the data is then read
into the data cache if it does not already reside there. The WHERE clause can match more
than one column if it is a composite index, but it must always match the leading column.
A SARG is coded in one of the following styles:

<column> <operator> <expression>


<expression> <operator> < column>
<column> is null

Column represents the name of a column in a table.

In the examples above, the following rules apply:

• The operator can be one of the following: =, >, <=, >=, !=, !>, !<, and <>, and is null.
The non-equality operators, <> and !=, are unique. The optimizer will check for
covering non-clustered indexes if the column is indexed and use a non-matching index
scan if an index covers the query. If the index does not cover the query, however, the
table will be accessed via a table scan.
• Expression is either a constant or an expression that evaluates to a constant, like:
column_name = (720 * 9).

An example:

The column Client_name in the client table has an index on it:

Select Client_name, Client_volume


From Client
Where Client_name="VeryBigBank"

The optimizer, in the example above, will use the index on Client_name to find matching
rows. In the following example, both columns, Client_name and Client_volume, have indexes
on them. The optimizer, in this case, will use internal statistics to find the 'cheapest' index to
access the data pages and return data.

Select Client_name, Client_volume


From Client
Where Client_name="VeryBigBank"
And Client_volume = "10000"

Matching Datatypes in SARGs

One common problem affecting the use of SARGs is datatype mismatch. Since the optimizer
converts mismatching datatypes, indexes on these columns may not be usable, resulting in the
optimizer choosing to scan the table. Mismatches occur for the following reasons:

• SEARCH clauses using variables or stored procedure parameters that have a different
datatype than the column, where int_col = @money_parameter.
• join queries in which the columns being joined have different datatypes, where
tableA.int_col = tableB.money_col.

The most common types of mismatches are:

• Comparisons between datetime and smalldatetime.


• Comparisons between numeric and decimal types of differing precision and scale.
• Comparisons between the integer types, int, smallint, and tinyint.
• Comparisons between money and smallmoney.
• Comparisons between numeric or decimal types and integer or money columns.

To avoid these mismatches, use the same datatype (including the same precision and scale)
for columns that are likely join candidates when you create tables. Use matching datatypes
when defining any variables or stored procedure parameters used as search arguments. When
mismatches occur, the datatypes hierarchy, contained in the systypes table, determines
whether an index can still be used. The basic rules are:

• For search arguments, the index can be used if the column's datatype is the same as or
precedes the hierarchy value of the parameter or value.
• For a join, the index can be used only on the column whose hierarchy value has
precedence - that is, it comes first in the datatypes hierarchy. To see the hierarchy of
datatypes you can run the following simple query: select hierarchy name from
systypes and order by hierarchy. When looking at the output of this query, remember
that the lower the numeric hierarchy value, the higher the precedence. There are, of
course, exceptions.

When comparisons are made between char and varchar and binary and varbinary
columns/parameters, these datatypes are considered identical even though their hierarchy
values differ; two datatypes are considered to be the same if the only difference is their null
status (null/not null).

Comparisons of decimal or numeric datatypes also take precision and scale into account. This
is true for comparisons of numeric or decimal datatypes to each other and to different
datatypes like int and money. For a join involving numeric or decimal datatypes, an index can
be used on the column if both its scale and length equals or exceeds the scale and length of the
other column. If, for example, you joined a numeric(15,4) column to a numeric(14,4) column,
the index can only be used on the numeric(15,4) column since the length of numeric(14,4) is
smaller than the length of numeric(15,4).

A mismatch in datatypes can interfere with the optimizer's ability to use statistics, causing the
server to rely on 'magic numbers,' which can be misleading. The magic numbers are:

• For an equality test, assume that 10% of the table is to be returned.


• For a greater-than or less-than test, assume that 33% of the table is to be returned.
• For a range test, assume that 25% of the table is to be returned.

Examples of valid SARGs:

au_lname= "Zebra"
Price >= $29.00
au_lname like "Zeb%" and price > $17.50
Advance > $9000 and advance < $15000

Examples of invalid SARGs:

Advance * 3 = 15000
Substring (au_lname, 1,3)

The first two invalid SARGs can be optimized if used like the following:
Advance = 5000/3
au_lname like "Zeb%"

The second invalid SARG can be further optimized by rewriting it as follows:

au_lname >= "Zeb" and au_lname < "Zec"

When tuning search arguments, however, always make sure your changes don't violate
processing logic or business rules.

Avoid equality checks on float datatypes. Float variables are approximate values and cannot
store exact values. For more information on the float datatype, refer to IEEE 754
specifications.

SARG Guidelines

• Eliminate functions, arithmetic operations, and other expressions on the column side
of SARGs (Advance * 3 = 15000).
• Use appropriate operators (from the examples above).
• Eliminate incompatible datatypes.
• Use selective SARGs to give the optimizer as many hints as possible.
• Make sure you always use the leading column of composite indexes.
• Use showplan to check on the SARGs selection (covered later in this chapter).
• Optimize the WHERE clauses for update and delete transactions. Non-optimized
DMLs will take an exclusive table lock, completely blocking access to the table for the
duration of the transaction.
• Keep transactions as short as possible and avoid user interaction within transactions
(otherwise, we get the famous 'begin tran…go to lunch…end tran' example).
• Avoid or at least minimize the use of while loops.
• Use select 1 instead of select * in EXISTS clauses.
• Use EXISTS instead of select count wherever possible. EXIST clauses will stop after
the first match, as opposed to a select count, which will go through all rows which
match the criteria to get a total count.
• Use >= and <= instead of > and < if possible. >= can save substantial I/O in some
cases since it will search less rows (x > 3 will start reading rows at 3; x >= 4 will start
at 4).
• Avoid mathematical manipulation in WHERE clauses (WHERE z / 17 > 42 ).
• Use your WHERE clause correctly. WHERE name=@x is useful for index selection,
while WHERE @x=name is not.
• Give as many hints to the optimizer as possible. Instead of using just WHERE
x.id=y.id and y.id=z.id, add 'and x.id=z.id.'
• Use the LIKE operator correctly. Do not use wildcards at the beginning of a query
(i.e., WHERE column like '%eb' will not use an index).
• Not equal (!=) expressions will be used for index selection only when the index covers
the query (can return the results via the index without accessing the table).
• If using an OR clause that references two distinct columns, ensure that both are
indexed. Otherwise, the optimizer will scan the tables.
• Use non-logged operation select into for faster performance whenever possible.

Note In versions prior to 12, Sybase recommended using exists and in instead of not exists and
not in. These recommendations were dropped from the manuals in version 12. In addition,
some performance tests we've done show that there is no, or very minimal, difference
between the two.

Temporary Tables and Stored Procedures

It's common practice to use temporary tables heavily in stored procedures, but if you process a
temporary table within the same stored procedure in which it was created, the optimizer has
no statistics about the size of the table or the cardinality of its data, even if you create an index
on the table. Consequently, the optimizer bases its access plan selection on an assumption that
the table contains 100 rows of data spread across ten data pages. This can spell disaster if your
temporary table actually contains a few million rows. Then again, at least you will have time
to update your resume while waiting for your query to complete! To avoid this problem, split
your temporary table processing into separate stored procedures. Create, populate, and index
the temporary table in one procedure and perform all additional processing in a second
procedure, which is called by the first. This ensures that the optimizer will have enough
information about the temporary table to formulate the correct access plan.

Example:

Create procedure X as
Select * into #Client from Client where date = "June 7 2001"
Select * from ComOrder, #Client where ...

Instead, use:

1. Create an empty temporary table so procedure X2 will not fail:

Select * into #Client from Client where 1=2 (create an empty


table)

2. Create procedure X2:


3. Create procedure X2
4. as
Select * from DeskOrder, #Client where ...

5. Create procedure X:
6. Create procedure X
7. as
8. Select * into #Client from Client where date ="June 7 2001"
9. Create unique clustered index Xclient on #Client(client_num)
Exec X2

We have seen extreme performance improvements when making such changes. A stored
procedure that took over 20 minutes to complete ran in less than 30 seconds after this change
was implemented.

The new maximum number of arguments for stored procedures is 2048 (up from 255 in 12.0).
For each argument, however, the server must set up and allocate memory for various internal
data structures. Any increase in the number of arguments, therefore, may cause performance
degradation for queries that deal with larger numbers of arguments.
The maximum size for expressions, variables, and arguments passed to stored procedures is
16384 (16K) bytes for any page size (up from 255 bytes in previous versions). This can be
either character or binary data.

Earlier versions of Adaptive Server had a maximum size of 255 bytes for expressions,
variables, and arguments for stored procedures. Any scripts or stored procedures you may
have written for earlier versions of Adaptive Server that used this old maximum may now
return larger string values because of the larger maximum page sizes. Because of the larger
value, Adaptive Server may truncate the string, or the string may cause overflow if it is stored
in another variable or inserted into a column or string. If columns of existing tables are
modified to increase the length of character columns, you must also modify stored procedures
that operate data on these columns to reflect this new length.

select datalength(replicate("z", 500)),


datalength("abcdefgh....255 byte long string..." + "xxyyzz ... another 255
byte long string")
----------- -----------
255 255

Evaluating Performance

When evaluating performance, you should start with the basic tools before employing more
advanced techniques. Use ASE optimization tools as follows:

Basic:

Set showplan on/off.

Set noexec on/off.

Set statistics io on/off.

Set statistics time on/off.

Set fmtonly on/off.

Advanced:

dbcc 302 and 310

Set forceplan.

Set table count.

Select, delete, update clauses with (index...prefetch...mru_lru...parallel).

Set prefetch.

Set sort_merge.

Set parallel_degree.
sp_cachestrategy.

When you need more information relating to your queries and how ASE processes your
queries, the basic tools above can help you. If you do not solve your problem, you can use the
advanced tools, knowing that they may take more time and effort. The following will present
ways to read the server's selections and help you in deciphering information about how your
stored procedures are run, how long they take to execute, and what the heck they were doing
for so long.

Note If the output of the stored procedure contains a large results set, output the results to a
file:

isql:
isql -P password -e -i input_file -o outputfile
sqsh (same as isql or):
Select...from...where...
Go > output_file

showplan

Tuning using the showplan option prints out your SQL's 'query plan,' which is the optimizer's
choice with regard to I/O usage, indexes, and processing.

set showplan on displays the steps performed for each query in a batch. Using this in
conjunction with the noexec feature will enable you to see the showplan without having to
wait for the query to complete; this is especially helpful if your query is very slow or
processes a very large number of rows.

To activate showplan:

set showplan on
go

To activate noexec:

set noexec on
go

Remember, though, once you activate the noexec option, all commands executed after this are
not executed. Once you have completed your analysis, turn off noexec immediately.

Statistics Time

Statistics time will tell ASE to display the number of CPU ticks and milliseconds needed to
parse and compile the query, along with the ticks and milliseconds needed to execute each
step of the command. The results of statistics time can sometimes be misleading, since it can
account, for example, for the transfer of data to a client.

Parse and compile time The number of CPU ticks taken to parse,
optimize, and compile the statement. By default, a
CPU tick on ASE is 100 milliseconds.
Execution time The number of CPU ticks taken to execute the
statement.
CPU time The total number of CPU milliseconds needed to
execute the query.
Elapsed time The difference between the time the command
started and when it ended in clock time.

To convert ticks to milliseconds:

CPU_ticks * clock_rate / 1000 Milliseconds

To see the clock_rate for your system, execute:

sp_configure "sql server clock tick length"

statistics io

statistics io is a very useful tool in determining the source of performance problems. The set
statistics io command tells ASE to report the number of table scans, logical reads, physical
reads, and writes for each table used in the query.

Scan count The number of times the table was accessed.


Logical reads Total number of 2K pages read.
Physical reads Total reads from disk, regardless of size.
Total writes The number of pages written to disk.

fmtonly

If you do not want the procedure to fully execute, but still want to obtain the necessary output,
run the following set command prior to running the query:

set fmtonly on

This allows the procedure to compile but not execute. You will not see the output of set
statistics io or time because the procedure will not execute. However, you will see query plan,
dbcc traceon output, and the column headers for the query's result set.

forceplan

forceplan forces the query to use the tables in the order specified in the FROM clause. Use
this if you suspect the optimizer is not making the right choices, and compare results to the
access plan originally selected by the optimizer.

set forceplan
Table Count

Table count increases the number of tables that the optimizer considers at one time while
determining join order - the default is 4 for all joins involving <= 25 tables. You can increase
or decrease the table count to assist the optimizer. However, increasing the table count means
that compilation time will also increase, as the optimizer will spend more time evaluating
various join combinations. Make sure that you receive an increase in performance that
justifies the increase in compilation time.

Specifying a Strategy (index...prefetch...mru_lru... parallel)

You can explicitly specify the index, I/O size, or cache strategy to use for a given query in the
code itself.

select, delete, update clauses with


(index...prefetch...mru_lru...parallel)

Use this if you suspect the optimizer is not using the right I/O mixture, for example, or to test
the performance of a different prefetch strategy (MRU, LRU, etc.).

sp_cachestrategy

Use this to set status bits to enable or disable prefetch and fetch-and-discard cache strategies.

sp_cachestrategy, db_name, table_name


sp_cachestrategy, db_name, table_name, index_name
sp_cachestrategy, db_name, table_name, "table only", mru, "on"
sp_cachestrategy, db_name, table_name, "table only", prefetch, "on"

In order, the examples above will show caching strategy for a table, a given index on that
table, activate mru caching strategy, and prefetch.

Prefetch and mru caching default to on for all objects. You can override this at execution time
if testing shows this will improve performance. However, if prefetching is turned off for an
object, you cannot override and turn it on; you can only turn it off for tables for which it is on.

sort_merge

sort_merge allows or disallows sort-merge joins and join transitive closure for a specific
query. This can also be set as a server-wide configuration parameter. The default status is off.

set sort_merge {on|off}

parallel_degree

This specifies the maximum number of worker processes that can be used for the parallel
execution of a query. The number specified must be <= the value of the max parallel degree
configuration parameter. The @@parallel_degree global variable always stores the current
setting.

set parallel_degree 5
Using the Tools

There is no right or wrong way to test your queries. With time, each person will develop his
or her own technique for testing. This may depend largely on the knowledge level of the
person, the database structure, and a million other reasons. Let's walk through a simple tuning
example.

Assuming you have encountered performance issues with one of your stored procedures, let's
first use statistics io to measure the resources used by your stored procedure first and look for
obvious trouble spots.

Table: ComOrder logical reads: (regular=4951 apf=1595 total=490546)


Table: ComEntry logical reads: (regular=1894 apf=1816 total=173710)
Table: ComPortfolio logical reads: (regular=14428765 apf=0 total=1428765)

You can immediately tell from this example that something may be wrong with the
ComPortfolio table. Now that you have a hint about where your problem is, you should look
at your procedure's, or a specific piece of SQL's, access plan:

set showplan, noexec on


go

Run your procedure again, and see why the 'troubled' regions of our query act the way they
do. showplan and statistics io can give you almost all the information you need in order for
you to understand what your query is doing.

Interpreting a Showplan Report


QUERY PLAN FOR STATEMENT 1 (at line 1)

STEP 1
The type of query is SELECT.

FROM TABLE
ComOrder
Nested iteration.
Index : qtord_ix
Forward scan.
Positioning by key.
Keys are:
qtord ASC
Using I/O Size 2 Kbytes for index leaf pages.
With LRU Buffer Replacement Strategy for index leaf pages.
Using I/O Size 16 Kbytes for data pages.
With LRU Buffer Replacement Strategy for data pages.

FROM TABLE
ComReport
Nested iteration.
Index : bftkey
Ascending scan
Positioning by key.
Index contains all needed columns. Base table will not be read.
Keys are:
orderId ASC
Using I/O Size 2 Kbytes for index leaf pages.
With LRU Buffer Replacement Strategy for index leaf pages.
FROM TABLE
ComPortfolio
Nested iteration.
Table Scan.
Forward scan.
Positioning at start of table.
Using I/O Size 2 Kbytes for data pages.
With LRU Buffer Replacement Strategy for data pages.

QUERY PLAN Marks the beginning of each query plan.


STEP Sequential number for each step and
each statement.
Type of Query Reports on the type of query - select,
insert, update.
FROM TABLE Denotes the table that is being accessed.
Nested Iteration Denotes the execution of a data retrieval
loop.
Table scan Denotes the type of table access.
Ascending scan Denotes the direction of the scan.
Positioning at start of table Indicates where the scan begins.
Using I/O Size Specifies the I/O size being used.
With LRU Buffer Denotes the buffer replacement strategy
being used.

Once you discover what areas are problematic within your stored procedure or individual
query, you can take action. This ranges from adding an index and improving joins to
correcting SARGs and recreating temporary tables. Once you make any changes, you should
immediately see if they improve performance. Remember, though, always check to ensure
that your changes are compatible with processing and business logic.

Using Simulated Statistics

optdiag can generate statistics that can be used to simulate a user environment without
requiring a copy of the table data. This permits analysis of query optimization using a very
small database. Moreover, if you load simulated statistics into a database, do not replace the
existing statistics. When using simulated statistics, you must instruct the optimizer to use
them:

set statistics simulate on

Measuring Stored Procedures at Run Time

There are several ways to measure stored procedures, including using a stopwatch. One quick
way, however, to measure stored procedure performance, especially if we're talking about
long running stored procedures, is to implement a measuring technique.
You do this by creating two tables, one to hold the start time of the stored procedure and the
other to hold the end time. An identity column on the start table will allow us to match up the
start and end times in the two tables.

Create table
Proc_start_history ( id numeric identity,
date datetime,
name char20)

Create table
Proc_end_history (id numeric,
date datetime)

Insert the following lines to the beginning of each stored procedure that you'll measure:

Declare @id numeric,


@name char20
Insert proc_start_history (date) values (getdate( ) )
Insert proc_start_history (name) values (hardcoded_proc_name)
Select @id=@@identity

At the end of the stored procedure, insert the following line:

Insert Proc_end_history (id, date) values (@id, getdate( ))

To see the duration of the stored procedure, you can run the following query:

Select
Procedure_name = s.name,
Span = datediff (second, f.date, e.date),
Times = count(*)
from
Proc_start_history s,
Proc_end_history e
Where
s.id=f.id
group by 1
order by 1

The entries in the two tables will now match with their IDs against each other, and you'll be
able to measure how long they take to run. We suggest doing this during testing only, since
there will be performance penalties if this is implemented in a production environment.

Checking for Join Columns and Search Arguments

In most cases, Adaptive Server uses only one index per table in a query. This means that the
optimizer must often choose between indexes when there are multiple WHERE clauses
supporting both search arguments and JOIN clauses. The optimizer first matches the search
arguments to available indexes and statistics and estimates the number of rows and pages that
qualify for each available index.

The most important item that you can verify using dbcc traceon(302) is that the optimizer is
evaluating all possible WHERE clauses included in the query. If a SARG clause is not
included in the output, then the optimizer has determined it is not a valid search argument. If
you believe your query should benefit from the optimizer evaluating this clause, find out why
the clause was excluded and correct it if possible.

Once all of the search arguments have been examined, each join combination is analyzed. If
the optimizer is not choosing a join order that you expect, one of the first checks you should
perform is to look for the sections of dbcc traceon(302) output that show join order costing:
There should be two blocks of output for each join.

If there is only one output for a given join, it means that the optimizer cannot consider using
an index for the missing join order. Check your SARGs to see if they are used correctly in
datatypes, indexes, etc.

Avoiding Stored Procedure 'Hot Spots'

Performance issues associated with stored procedures may take place when a stored procedure
is heavily used by single or multiple users and applications. When this happens, the stored
procedure is considered a hot spot in the path of an application, since it may slow down the
processing. If you are able to rewrite the query, shorten transactions, or change the
processing, it would most likely be the most useful solution. If you are not able to do any of
the above, you can try changing the priority of the stored procedure. Usually, the execution
priority of the applications executing the stored procedure is in the medium to low range, so
assigning more preferred execution attributes to the stored procedure might improve
performance for the application that calls it.

Improving Stored Procedure Performance

Here are suggestions to improve stored procedure performance:

1. Organization - Organization and order are vital parts of the stored procedure
performance improvement process. Are your stored procedures documented? Is there
some sort of version control? Are you using naming conventions? Are you
documenting performance testing? These are all very important when 100 users want
to know why their queries aren't working. Organization is a vital part of this process.
2. Smaller sets of data - If you're working on a large table and processing is slow,
analyze your needs. Can you select only the necessary data into temporary tables and
do some of the processing on those smaller tables?
3. Isolate bottlenecks - Using the tools we described above, extended stored procedures,
and monitor server, you can find out which procedures cause the most bottlenecks. If
your job is to find out where stored procedures are getting 'stuck,' start by finding out
who's running them, when, and at what instances. Try to find out which stored
procedures are the biggest burden on the server. Optimizing their performance will
alleviate some of the workload on the server and will allow you to concentrate on your
next step with less users at your door.
4. Indexes - Indexes can improve performance of stored procedures if the optimizer can
use them. A large number of indexes on a single table can also slow down
performance of data modification statements. Analyze your needs and your ability to
add indexes, and use them where you can. In addition, if you have indexes that are not
being used, but you think they should, analyze them as well. Are the keys too big? Are
your indexes explicitly defined as unique? Is your code preventing the optimizer from
using the index?
5. Server configuration - Server configuration can play a vital part in query and stored
procedure processing. Are your queries doing table scans? Are you using large I/O? If
you use a 16K I/O instead of a 2K I/O to read large sets of data, you can gain
considerable performance benefits.
6. SARGs - Search arguments play a very large role in stored procedure performance.
Check SARG guidelines, and make sure you're not misleading the optimizer.
7. Suboptimal query plans - A plan for a specific query may not be correct if there are
different input parameters, new indexes have been added, or the size of the table
referenced has changed. Update index statistics (and inner column statistics where
possible) and recompile.
8. Temp tables - Create and use temp tables outside of each other. Do not create,
populate, and select data from a temp table all in the same procedure.
9. Reduce locking - Keep your transactions as short as possible and your queries in the
same order to avoid deadlocks. If possible, access tables in the same order.
10. Don't panic - This is probably the most important advice in this chapter. If you can do
this, you can do the first nine with ease. It's all part of a learning process, which we are
all involved in. Practice makes perfect, and it's always a good idea to look at some
stored procedures in your development environment and see if they are 'optimizable,'
according to some of the suggestions in this chapter.
11. Always remember the rule of diminishing returns - Once you start optimizing the
query, you will reach a point where you cannot get any better performance unless you
drastically change the architecture of database design.

Case Study

Recently, a stored procedure that has always behaved (less than one minute) had an amazing
degradation of performance (over ten minutes). When I investigated, I found out that by using
showplan and statistics io that it stopped using an index on a major table. I looked at the table
and did a few selects, only to discover that the indexed column had taken thousands of null
values in the past few months. The column was now 45 percent null and 55 percent with very
unique values.

I ended up dropping and recreating the procedure, this time with the index specified in the
query (force), and performance returned to normal.

Chapter 12: Locking


Overview

In Chapter 1 we talked about a variety of ways to define performance. Concurrency, the


number of processes that can simultaneously access data, is one way. The type and granularity
of the locks applied during processing will determine the level of concurrency your
application achieves. Sometimes you have control over this, and sometimes you do not. In this
chapter, we will discuss all aspects of locking:

• Consistency levels
• Lock isolation levels
• Lock granularity
• Types of page locks
• Data locking mechanisms
o Allpages
o Datapages
o Datarows

Why Objects are Locked

The role of the DBMS is to manage data. This includes, among other things, security and data
integrity. We will focus on integrity.

What would be the effect of writing two rows to the same place? What would be the effect of
allowing two processes to update the same row at the same time? How about allowing one
process to read a row when the change is not yet permanent (committed) in the database?

From a basic transactional level, Adaptive Server automatically locks pages that are being
modified until modifications are complete. All modified pages for a transaction remain locked
until the transaction either completes or rolls back. Pages that are being modified cannot be
read, and pages that are being read cannot be modified.

ANSI Transaction Isolation Levels

The ANSI-89 SQL standard defines four levels of isolation for transactions. Each isolation
level describes the types of actions that are and are not permitted while concurrent
transactions are executing.

Level 0

Transaction isolation level 0 allows dirty reads. This means that Adaptive Server will permit
reads of uncommitted data. This is sometimes used in systems where 'close' numbers are as
acceptable as 'exact' numbers.

Level 1

Transaction isolation level 1 prevents dirty reads. Adaptive Server implements this level with
exclusive locks. In other words, the object being locked is accessible only by the process that
placed the lock; no other readers or writers are permitted. This is the default Adaptive Server
transaction isolation level.

Level 2

Transaction isolation level 2 prevents non-repeatable reads. It keeps a second transaction from
modifying a row previously read within another transaction. In other words, any row that you
read will remain locked for the duration of your transaction so that no other process can
update the row. Transaction isolation level 2 includes transaction isolation level 1 restrictions.
This is not recommended because it causes contention and increases the likelihood of
deadlock incidences.
Level 3

Transaction isolation level 3 prevents phantom reads. In other words, it prevents the
transaction from returning two different result sets for the same search criteria. Transaction
isolation level 3 includes transaction isolation level 2 restrictions. This is not recommended
because it causes contention and increases the likelihood of deadlock incidences.

Default Isolation Level

The Sybase Adaptive Server default isolation level is 1. This gives you the highest level of
concurrency that will also protect uncommitted transactions.

The ANSI standard requires the ability to permit a default isolation level of 3 for transactions,
which you'll implement with the set command. Note that this is not recommended for
maximum concurrency.

Setting Session Isolation Levels

Isolation levels can be set for a session.

Syntax:

set transaction isolation level X

Effects:

Isolation Level Description


0 Read uncommitted is applied to all select statements.
1 This is the default Adaptive Server behavior. Read committed is
applied to all selects.
2, 3 Automatically applies holdlock to all select statements in a
transaction (note that this is not a recommended behavior, as it leads
to a variety of deadlocking situations).

To determine the current transaction isolation level during a session, examine the
@@isolation global variable.

Example:

select @@isolation
go
Displayed Level Meaning
0 dirty reads (system 11 and later)
1 isolation level 1
3 isolation level 3
Setting Statement Isolation Level

Isolation levels can be set for an individual select statement. You may, for example, want to
perform a dirty read for a single statement but not affect the rest of the transaction. You do
this by adding noholdlock to the select statement.

Syntax:

select (expression) from table [holdlock | noholdlock |shared]

lock table Command

You also have the ability to issue the lock table command within a transaction. This is new for
11.9.2.

Syntax:

lock table table_name in {share | exclusive} mode [wait [numsecs] | nowait


]
Option Description
share | exclusive Specifies the type of lock, shared or exclusive, to be
applied to the table.
wait numsecs Specifies the number of seconds to wait if a lock
cannot be acquired immediately. If numsecs is
omitted, this specifies that the lock table command
should wait until lock is granted.
nowait Causes the command to fail if the lock cannot be
acquired immediately.

Example:

begin transaction
lock table titles in share mode

This locks the titles table but still permits concurrent access by other read-only processes.

Lock Granularity

Granularity is the minimum amount of data that is locked as part of a query or update. The
smaller the lock size, the greater the number of potential concurrent users. Using a smaller
lock size, however, means greater overhead and resource usage since you will acquire more
locks. The greater the lock size, the less overhead that is required to manage locks, but fewer
concurrent users can access the data.

Adaptive Server balances this by using page-level locking as the default. Starting in 11.9.2,
you can also implement row-level locking in order to increase concurrency; we will discuss
the types of locks that ASE supports in more detail in the next section. Be warned, though,
and carefully review whether the max number of locks configuration option is adequately
sized before you implement row-level locking in production. There is no good excuse for
running out of locks during the middle of the trading day! In addition to the usual locks you'll
see (shared, exclusive), there are two other types of locks - demand and update.

An update lock is a deadlock avoidance mechanism. The server places update locks on all
requisite pages before escalating to an exclusive lock to prevent deadlocking. Demand locks
are used by ASE to indicate that a transaction is next in queue to lock a table. They are
applied when an update lock has been waiting in the queue to escalate to an exclusive lock,
and three read processes have been permitted by ASE to skip over that update process and
apply shared locks on the table. ASE gives a demand lock to the update process, and the
update lock forces all subsequent read processes to queue behind the update. However, the
update process will not escalate until the existing share locks are released. Unless you
override the standard locking, Adaptive Server locking is automatic. Adaptive Server will
typically lock at the table, page, or row level, depending upon the interplay between table
definition, SQL statement, and table indexing.

System tables and indexes are locked using a more sophisticated system (resource locks) and
may not be locked at all if you have set row-level locking or datapages locking on for the
table.

Types of Page/Row Locks

At the page and row level, you will see four basic types of locks at page access time - shared
locks, exclusive locks, update locks, and demand locks (demand locks are actually pretty
rare). A shared lock means that any process can read, none can write, and there is no limit to
concurrent users. An exclusive lock means that only the process which owns the lock can read
or write. No other concurrent users are permitted unless the isolation level is set to 0. Update
locks are used by Adaptive Server as a 'bookmark' to help it avoid deadlocking. It allows other
shared locks, but no other update locks or exclusive locks. Demand locks are used to obtain a
page/row for exclusive use when combating a steady stream of shared locks.

Table Locking

Adaptive Server will try to lock at the page or row level (depending on configuration) to
maximize concurrency. As the number of locks acquired on a table by a process continues to
increase, Adaptive Server may escalate to a table lock. Regardless of whether ASE is
acquiring page or row level locks, escalation is always to table level locks. Row level locks do
not escalate to page locks. At the table level, you will see shared locks, exclusive locks, and
intent locks.

Shared table locks permit other readers (at row, page, or table level) but no other exclusive
locks. They are acquired when:

• The holdlock command option or lock table command is specified.


• The SQL statement does not have a corresponding index or the SQL is non-
SARGable.
• A non-clustered index is being created.
• Lock escalation has occurred.

Exclusive table locks are used for updates and deletes if there is no index available, if the
SQL is non-SARGable, and during the creation of a clustered index. Intent (shared or
exclusive) indicates at the table level what types of locks are being acquired at a lower level.
This is used for deadlock avoidance (discussed in Chapter 13).

ASE Locking Schemes

There are three locking schemes that Adaptive Server supports - allpages, datapages, and
datarows.

The default locking mechanism (and prior to 11.9.2, the only locking scheme) is called
allpages locking. This means that a lock will be held for the duration of the transaction on all
pages being modified, including index pages. From a system perspective, this is the least
expensive locking strategy since only one lock is acquired per page. From a concurrency
perspective, however, this can be the most restrictive, as there is frequently contention on data
and, especially, index pages.

Datapage and datarow locking represent the opposite paradigm. They require more resources
since they acquire more locks but will support far more concurrency, especially during update
and delete processing. There are two reasons for this. First, locking is far more granular,
especially in the case of datarow locking. Second, these two locking strategies employ
latches. Latches are non-transactional in nature; that is, they are held only long enough to
make the change to the data or index page and then immediately released. In the case of
datapage locking, the latches are applied to the index pages, which is typically the source of
most intra-process deadlocking. For datarow locking, latches are used on both the data and
index pages.

Allpages Locking

The following table has allpages locking. We are inserting a row into the table as follows:

insert authors (lname,fname) values ('Poe','Edgar')

Adaptive Server is going to lock the data page, as well as the index pages that must be
updated, to reflect the additional rows. If pages need to split or combine, additional pages may
need to be locked.

Datapages Locking

When locks are being placed on a table using datapages locking, locks are held only on data
pages. Updates to index pages are performed using latches, which only exist for the duration
of the physical change. Transactional locks are not held on the index pages.
Datarows Locking

When locks are being placed on a table using datarows locking, locks are held only on
individual rows. Updates to data pages place a latch on the page. Updates to index pages are
also performed using latches, which only exist for the duration of the physical change. No
transactional locks are held on the index or data pages.

Specifying a Locking Scheme

You may configure locking schemes at the server level or table level. We recommend that
you leave locking at the installed server default, allpages, and only select alternate locking as
the need arises.

Datarows or datapages locking creates a physical overhead of approximately 40 percent of


storage, so we recommend that in any substantially sized database, you upgrade locking
schemes selectively.

Server Level Locking

Configure the server level default locking scheme using the sp_configure command as
follows:

Syntax:

sp_configure "lock scheme",0, {allpages | datapages | datarows }

This sets the default lock table for new tables to be created. Note that it has no effect on tables
that already exist.

Table Level Locking

You can specify the locking scheme for a new table within the create table or alter table
command.

Syntax:

Create table table_name (column_list)


[ lock {allpages | datapages | datarows } ]
Alter table table_name
lock {allpages | datapages | datarows }

Note that the alter table lock command runs faster with up-to-date statistics.

Concurrency Issues
Allpages Locking

Allpages locking locks data and index pages. Comparatively, it offers the least concurrency of
the locking behaviors, although it is sufficient for most applications. This is the default
behavior, and it is also the only behavior prior to 11.9.2.
Advantages:

• Simple and fast.


• Takes the fewest locks.
• Simplifies recovery if only one transaction can modify a page at a time.

Disadvantages:

• Concurrency is decreased at update time.


• Index or data page splits require many locks.
• There may be index and heap table hot spots.
• This scheme has the most deadlock potential.

Use this unless you've positively identified a problem. It's the fastest and uses the least
storage, memory overhead, and row migration overhead (it doesn't need to be managed with
the reorg command).

Datapages Locking

Datapages locking locks data pages but not index pages. There is more concurrency than with
allpages locking, and it eliminates concurrency issues at the index level.

Advantages:

• No transaction duration locks on the index pages.


• Fewer locks taken than row-level locking.

Disadvantages:

• You may still have data page hot spots.


• Requires more storage than allpages locking.

Use datapages locking when data-only locking is required, and the number of locks is an
issue.

Datarows Locking

Datarows locking locks data at the row level. It is used when you need the most concurrency.

Advantages:

• No transaction duration locks on index or data pages.


• Greatest concurrency.
• Relieves hot spots.

Disadvantages:

• Larger number of locks necessary (50 to 100 percent more).


• Requires more storage than allpages locking.
Use datarows locking when contention is a factor, especially if the workload is primarily
deferred updates. You can also use it when space is an issue but the table is small and in high
contention. Also use it when deadlocks are an issue.

Configuring Lock Promotion

By default, Adaptive Server escalates page/row locks to a table level lock when more than
200 locks have been acquired on a table in a command. This threshold can be adjusted by
manipulating the lock escalation thresholds. You can define these lock escalation thresholds at
the table, database, or server level.

There are three lock promotion thresholds that can be set:

• lock promotion hwm (high water mark) - The lock promotion hwm sets a maximum
number of locks that can be held on a table without triggering lock escalation.
• lock promotion lwm (low water mark) - The lock promotion lwm sets the number of
locks below which ASE will not try to issue a table lock on an object.
• lock promotion pct (percentage) - The lock promotion pct sets the percentage of
locks, based on the table size, above which Adaptive Server attempts to escalate page
level locks to a table level lock. Adaptive Server does this whenever the number of
locks aquired on an object is greater than the LWM but less than the HWM.

Take a look at the previous diagram. As the number of locks increases past the low water
mark (LWM), the server will consider escalating to a table level lock. If it exceeds the HWM,
the server must try to escalate to a table level lock.

What if the number of locks is between LWM and HWM? In that case, if the percentage of
the entire table that is locked exceeds the percentage, the server must try to escalate.

Remember that lock escalation is not a guaranteed process. If a process reaches the point
when escalation is warranted, based either on the HWM or the LWM+PCT, ASE may not be
able to escalate because incompatible locks are being held on the same table by one or more
additional processes. In that case, the process that wanted to escalate will continue to acquire
page/row locks until the process completes or the server runs out of locks. I guarantee that
before this happens, your users will run out of patience when their processes take longer and
longer to acquire and release locks.
Setting Lock Promotion Values

To set the default lock escalation thresholds for all tables in Adaptive Server, use either the
sp_configure stored procedure (for server-wide effects) or the sp_setpglockpromote (for
database- or table-specific effects).

Syntax:

sp_setpglockpromote "server", NULL, new_LWM, new_HWM, newPCT

sp_configure "{lock promotion HWM |


lock promotion LWM |
lock promotion PCT}", value

Examples, setting at the server level:

sp_setpglockpromote "server", null, 200, 1000, 75

sp_configure "lock promotion HWM", 500

The first example sets the low water mark to 200, the high water mark to 1000, and the
percentage to 75 for the entire server. The second example changes the high water mark for
the server to 500.

Examples, setting at the database or table level:

To increase the thresholds specifically for the database containing those tables, or for the
tables themselves, use sp_setpglockpromote.

sp_setpglockpromote "database", pubs2, 200, 1000, 75


sp_setpglockpromote "table", titles, 200, 1000, 75

The first example sets configuration values for the pub2 database. The second sets the
promotion values for the titles table. Any settings at the table level override the database and
server settings. To remove table or database lock promotion settings, use the
sp_droplockpromote stored procedure.

Configuring Locks

The total number of concurrent locks on a server is configurable by the system administrator.
The default will change according to platform and version, but it is typically 5000. Increasing
lock promotion thresholds may increase the number of concurrent locks held, resulting in an
'out of available locks' error. You should increase the locks configuration parameter
accordingly.

Syntax:

sp_configure locks, n

Where n = the maximum number of locks.

Note The rule of thumb is to assign 20 locks per concurrent process. If you have row-level
locking, we recommend at least twice that number.

Cursor Locking

For cursors declared with the read-only clause, a shared lock is placed on the current page or
row the cursor is pointing to (dependent on the locking scheme chosen). For cursors declared
with the FOR UPDATE clause or not declared as shared, an update lock is placed on the
current page that the cursor is pointing to. It is changed to exclusive lock if the page is
modified and then held until the transaction is committed.

If the update statement is not within a multi-statement transaction (i.e., begin tran/commit tran
pair), the exclusive lock reverts to an update lock once the update is committed. If you declare
the cursor with the shared keyword (recommended but not the default), the server will use
shared locks instead of update locks. This is useful if the cursor contains a join and you do not
wish to use update locks on both tables since you will only be updating one of the tables.

Summary

The Adaptive Server does its own locking to manage concurrency. It defaults to page-level
locking but also supports row-level locking. These locks may be escalated if the server finds it
expeditious. Conflicting locks are handled automatically, either by a wait state or, in the case
of a deadlock, a process termination.

Chapter 13: Deadlock


There is an old story about two Japanese samurai standing in the rain. Each has taken a strong
position, so neither moves because a move will relinquish the position, and the warrior will
lose the advantage. In the end, the samurai just stand in the rain and wait for the other to
move. Potentially, this could go on for a long time.

Deadlock is a similar concept. Essentially, a deadlock occurs when at least two connections
vie for the same resources. However, the locks of the processes prevent the processes from
moving forward. So the process appears to stall because it is waiting for an opposing process
to complete its next step, but the opposing process is doing the same thing.

What is Deadlock?

Deadlock is related to the locking mechanism inherent in all databases. This means that all
databases will have some deadlocks. This problem exists in all databases and cannot be totally
avoided. However, the occurrence of deadlock can be minimized through the use of carefully
written queries and stored procedures and by following some relatively basic rules. Even
select with group by or with aggregate functions can cause hidden deadlock in tempdb
databases.

Various Forms of Deadlock

In this example of a classic deadlock, we have two opposing transactions.


Step 1: Process 1 modifies (and obtains an exclusive lock on) Page #A.

Step 2: Process 2 modifies (and obtains an exclusive lock on) Page #B.

Step 3: Process 1 attempts to acquire Page #B, but it can't because of Process 2's exclusive
lock.

Step 4: Process 2 attempts to acquire Page #A, but it can't because of Process 1's exclusive
lock.

This classic example of deadlock would have occurred with table level locks. This could have
also occurred with page- or row-level locking. The size of the lock is irrelevant; it is about the
fact that exclusive locks exist, which create the dilemma of deadlock.

Deadlock with Holdlocks

The following figure presents an example of a deadlock that can commonly occur when using
the holdlock. Holdlock is used to prevent shared locks from being released until a transaction
is committed or rolled back. This is often used when a transaction first grabs a shared lock
and then later requires an exclusive lock to modify the retrieved data. Normally, the shared
locks will be released as soon as the data is retrieved. This might allow others to obtain shared
locks that will prevent the data from being changed. This might also allow exclusive locks to
be obtained by an outside transaction and modified.

In this example, process T1 obtains a shared lock but does not release the lock. Process T2
also obtains a shared lock and, likewise, does not release the lock. When T1 attempts to
change the data, it cannot obtain the required exclusive lock. It must wait for T2 to release its
lock to make the changes. When T2 attempts to change the data, it is also stuck because it
cannot obtain the required exclusive lock. T2 must wait until T1 relinquishes its shared locks.
Both processes are waiting for each other and cannot move until the other completes its work.

Front-End Deadlock

There are situations where there might appear to be a deadlock in progress.

In this example, process T1 is stuck waiting for process T2 to release its resoures. However,
process T2 does not seem to need any of T1's resources. Since the process could be completed
when T2 commits its transaction, it is not a real deadlock. This type of deadlock often occurs
when there is user interaction from the front end of the application during a transaction.
Error 1205

Adaptive Server is already designed to handle deadlocks. By default, Adaptive Server will
check for deadlocks every 500 milliseconds and will abort the transaction with the least
amount of CPU time if a deadlock is found. The remaining process will be allowed to
complete. The aborted process will receive an Error 1205, or if the transaction was initiated
by a stored procedure, a return code of -3 will be returned. Note that this does not abort the
batch, just the transaction. Applications should be written to accommodate the 1205 error. For
instance, an application can be written to retry the transaction a certain number of times
before the application decides to abort the process.

Configuration Options

print deadlock information is a dynamic configuration option that can be modified using the
sp_configure system procedure. Whenever a deadlock occurs, Adaptive Server outputs
information about the occurrence and passes information to the errorlog about how the
deadlock was handled. The output will list the suids of the users involved, the types of locks
held, the objects involved, and the SQL statements that caused the deadlock. This is usually
enough information to determine the cause of the deadlock.

00:00000:00012:2001/05/31 15:04:35.92 server The configuration option


'print deadlock information' has been changed by 'SysAdmin' from '0' to
'1'.
00:00000:00014:2001/05/31 15:05:14.20 server Deadlock Id 2 detected
Deadlock Id 2: detected. 1 deadlock chain(s) involved.

Deadlock Id 2: Process (Familyid 9, 9) (suid 1) was executing a SELECT


command at line 1.
SQL Text: select * from titles
Deadlock Id 2: Process (Familyid 14, 14) (suid 1) was executing a SELECT
command at line 1.
SQL Text: select * from authors
Deadlock Id 2: Process (Familyid 0, Spid 14) was waiting for a 'shared
page' lock on page 728 of the 'authors' table in database 4 but process
(Familyid 9, Spid 9) already held a 'exclusive page' lock on it.
Deadlock Id 2: Process (Familyid 0, Spid 9) was waiting for a 'shared page'
lock on page 808 of the 'titles' table in database 4 but process (Familyid
14, Spid 14) already held a 'exclusive page' lock on it.

Deadlock Id 2: Process (Familyid 0, 9) was chosen as the victim. End of


deadlock information.

This sample of deadlock messaging demonstrates the output for a deadlock similar to the
classic deadlock example. In this case, the spid 9 that locked the authors table is chosen as the
victim and its transaction is aborted.

Deadlock Checking

The default interval for checking deadlocks is 500 milliseconds. This is adequate for many
installations, but on a server that is reasonably deadlock free, this frequent checking of
deadlocks might be considered overhead. By increasing the deadlock checking period value,
the minimum wait between deadlock checks will be increased and the check will not occur as
often.
Example:

sp_configure "deadlock checking period", 700

Avoiding Deadlocks

Can deadlocks be eliminated? In a word, no.

As mentioned at the beginning of the chapter, deadlocks are inherent in the design of the the
locking scheme and cannot be eliminated. However, the occurrence of deadlocks can be
minimized by using the following techniques:

1. Try to do as much work as possible within stored procedures. While the use of stored
procedures alone does not do much to avoid deadlocks, it does decrease the odds.
Stored procedures generally run faster than straight SQL code. The speed of the
execution may sometimes be enough to release the necessary resources before a
deadlock occurs.
2. Acquire locks on objects in the same order. In the classic example described in this
chapter, the authors table is modified and then followed by selecting the titles table in
one process. The exact opposite is true of the second process. The table order of the
two processes are in direct opposition to each other. This creates the deadlock. If both
processes accessed/modified their tables in the same order, there is less of a likelihood
that a deadlock would occur.
3. Avoid the use of holdlock. As described earlier, holdlock seems to work best when
only one person is using a holdlock process at a time. If more than one person tries to
use the same holdlock process, the process will generate deadlocks.
4. Split the long running transaction into many small transactions.
5. Make sure to have the right indexes in the table, and you might use named cache to
avoid extra I/O. Bigger transactions cause more deadlock and the system will suffer
more.
6. Every time a temporary table is created or dropped, system tables in tempdb must be
updated. Frequently creating and dropping temporary tables can cause severe locking
contention on the sysobjects, syscolumns, and sysindexes system tables. Here are
some solutions:
o Try creating the tables at the start of the application.
o Use insert..select to populate them, and truncate table to remove all the rows.
o Instead of using select *, only select the columns you need for the query.

There are always trade-offs:

o Creating the table at the start of the application and then using insert..select and
truncate will increase the amount of logging and contention on the log.
o Unlike insert..select, select into is a minimally logged operation, but it locks
the table for the duration of the transaction.
7. On a high throughput system, deadlocks can be reduced greatly if replication
server/replicated database is used. Reporting and user select-only queries can run
against the replicated database and all inserts/updates go on the primary database.
8. Convert application from a multiple writer system to a single writer system. Using the
concept from the application server or IBM MQ concepts, which lines up the writes to
a single queue, could reduce the deadlocks to the system.
9. Reduce user interaction and network usages (RPC calls) within begin tran and end
tran.
10. Deadlocks can be reduced on some applications if you use row-level locking on
underlying tables rather than the APL lock scheme, which is the default. If some sets
of tables are accessed more through various processes, and these processes acquire
page lock on the data, chances of hitting deadlock increases. It may be worthwhile to
change the locking scheme of these tables to row-level locking rather than page-level
locking, which will provide you with more granular mechanism of lock and reduce
chances of deadlock.
11. sp_block for checking timeout and blocking
12.
13. select
14. p.spid,
15. suser_name(p.suid),
16. p.blocked,
17. suser_name(p1.suid),
18. db_name(p.dbid),
19. p.status,
20. p.program_name,
21. getdate()
22. from master..sysprocesses p, master..sysprocesses p1
23. where p.blocked > 0
and p.blocked *= p1.spid

Summary

Deadlocks are a part of life. To reduce deadlocks, identify the underlying application causes
and make the appropriate changes.

Chapter 14: Configuration


Sybase ASE uses configuration parameters to control various aspects of Adaptive Server
operation and performance. Each parameter is initialized to a default value when Adaptive
Server is installed. The defaults are listed in the system administration manual, and you
should carefully review each default to ensure it's appropriate for your system. While it is
easy to identify certain parameters that almost always require modification (did anyone really
run with the default total memory value in pre-12.5 servers?), some other parameters may be
more difficult to pinpoint. Administrators often overlook parameters that could improve their
server's performance. You should review your server's parameters, especially those that affect
user processes and memory allocation/use. To see which configuration processes use memory,
execute sp_configure memory use for a listing.

In the following sections, we will discuss different configuration parameters that have been
added to ASE since version 11.5. Since most relevant publications deal with the Sybase server
when it was still called SQL Server, we will add some information on parameters that have
appeared since the 11.0 days and how one should deal with them.

How to View Configuration Settings

The parameter sp_configure allows you to view all configuration settings for the server. Since
there are over 100 configuration parameters, starting with Sybase 11, configuration
parameters have been split into logical groups such as Disk I/O, Backup/Recovery, Cache
Manager, etc. The configuration output displays the following:

Group: [group name]

Parameter Name Default Memory Used Config Value Run


Value
------------------------------ ----------- ----------- ------------ ------
-----

• Group - The name of logically related parameters (which can appear in multiple
categories if they belong to logically similar groups such as 'number of open indexes,'
which appears in 'SQL Server Administration,' 'Memory Use,' and 'Meta-Data
Caches')
• Parameter Name - The name of the configuration parameter
• Default - The default value of the parameter when the server is first installed
• Memory Used - The amount of memory the current settings of the parameter consume
(as specified under the Run Value column)
• Config Value - The value of the parameter if it was changed from the default value
but has yet to take effect (i.e., requires the startup of the server)
• Run Value - The value of the configuration parameter when the server is started or the
actual value of the parameter if it is a dynamic option, in which case the config value
and run value will always be the same.

Display Level

Sybase ASE versions 11.5 and above contain a parameter, sp_displaylevel, that allows you to
change the display levels for each login. For example:

sp_displaylevel "user","basic"
sp_displaylevel "dbadmin","intermediate"
sp_displaylevel "sa","comprehensive"

The display level can be basic, intermediate, or comprehensive, and it will eliminate certain
parameters that a user may not be interested in. The default setting is comprehensive, which
will show all configuration parameters.

sp_helpconfig is another option that can assist users when viewing configuration parameters.
sp_helpconfig shows the same information as sp_configure but with the minimum and
maximum values that can be set for the specific parameter. For example:

[28] ANGEL.master.1> sp_configure "number of user connections"


[28] ANGEL.master.2> go
Parameter Name Default Memory Used Config Value Run
Value
------------------------------ ----------- ----------- ------------ ------
-----
number of user connections 25 11916 150
150
[29] ANGEL.master.1> sp_helpconfig "number of user connections"
[29] ANGEL.master.2> go
Number of user connections sets the maximum number of user connections that can be
connected to the SQL Server at one time.

Minimum Value Maximum Value Default Value Current Value Memory Used
------------- ------------- ------------- ------------- ------------
5 2147483647 25 150 11916

Configuration options can also be viewed by groups, for example:

[34] ANGEL.master.1> sp_configure "Meta-Data Caches"


[34] ANGEL.master.2> go

Group: Meta-Data Caches

Parameter Name Default Memory Used Config Value Run


Value
------------------------------ ----------- ----------- ------------ -------
----
number of open databases 12 433 12
12
number of open indexes 500 208 500
500
number of open objects 500 464 1000
1000
open index hash spinlock ratio 100 0 100
100
open index spinlock ratio 100 0 100
100
open object spinlock ratio 100 0 100
100

Specifying the logical group name will only show the parameters associated with the group.

You can also locate a parameter by specifying only part of its name and letting the server do
the rest of the work:

[35] ANGEL.master.1> sp_configure "databases"


[35] ANGEL.master.2> go
Parameter Name Default Memory Used Config Value Run
Value
------------------------------ ----------- ----------- ------------ ------
-----
number of open databases 12 433 12
12

How to Change Configuration Values

All configuration parameter changes can be performed by the sa user. A user with SSO
(System Security Officer) privilege can modify parameters related to security, such as all
password-related parameters, audit parameters, etc.; SSO is implicitly granted to sa login.

One way to change configuration parameters is interactively, meaning directly on the server,
with the sp_configure command. To change a parameter, simply execute:

sp_configure "parameter name", value


sp_configure "number of open databases", 20

As discussed previously, some of these changes are dynamic (they take place upon execution)
and some are static (they take place upon a restart of the Sybase server). In Sybase ASE 12.5,
several configuration parameters were made dynamic, where in previous versions they were
static. We will discuss this later.

When changing configuration parameters, the server saves the old configuration file, usually
in the Sybase home directory, as server_name.001 and creates a second file named
server_name.002 with the configuration change you have just made. The changes are also
written to the configuration file, which is named server_name.cfg, and defaults to
$SYBASE/config. The server uses this file at startup to configure the Adaptive Server. The
name and/or location of the file can be overridden in the server’s RUN file using the -c
parameter: -c/home/Sybase/MyCongfig/Server_ name.cfg.

You can use configuration files to quickly change the server to support changes in your
application profile. For example, your server might be heavily used for OLTP (online
transaction processing) during the day and for OLCP (online complex processing) during the
evening. Specifying a configuration file can improve the performance of the server due to the
wide difference in needs from the same server.

Other advantages of having an external configuration file include:

• The ability to replicate the server’s behavior across multiple servers by using the same
configuration file.
• A quick, text-based method of resolving some server problems even when the server is
not available. If you ask for too much memory, the server won’t come up. You don’t
want to have to log on to fix that problem now, do you? If there are problems, you can
simply revert back to the last configuration file by changing the reference in the run
file. Always try to introduce changes to the system in a controlled fashion. If you
change ten parameters and the server won’t start, you won’t know what the problem
is.

Specifying a Configuration File with sp_configure

You also have the ability to perform certain actions on the configuration file with:

sp_configure "configuration file", 0, "action", "file_name"

The following options are available:

sp_configure "configuration file", 0, "write", "server_name"

Creates the file_name from the current configuration parameters. If the file file_name already
exists, a message is written to the error log and the existing file is renamed using the
convention server_name.001, server_name.002. If you have changed a static parameter but
have yet to restart your server, "write" gives you the values from the Run Value column,
which are the currently running values for that parameter.

sp_configure "configuration file", 0, "read", "server_name"


Read performs validation checking on values contained in the server_name file and reads
those values that pass validation into the server. If any parameters are missing from
server_name, the current running values for those parameters are used instead.

sp_configure "configuration file", 0, "verify", "server_name"

Performs validation checking on the values in server_name.

sp_configure "configuration file", 0, "restore", "server_name"

Creates a server_name file with the values currently present in sysconfigures. This value is
used if all configuration files, for example, have been deleted and you need to generate a new
copy.

A second way to change configuration values is by directly editing the file in a text editor that
can read and save the file in an ASCII format. The format for each parameter in the file looks
like this:

parameter_name={value | default}.

[Cache Manager]
procedure cache percent = 25
global async prefetch limit = DEFAULT

When the parameter value is DEFAULT, the server uses the default value that came with the
server.

For a complete listing of configuration parameters and their usage, see the Systems
Administration Guide. The following sections will deal with configuration parameters that
were added to Sybase ASE since version 11.5.

New Configuration Features in Sybase ASE 11.5


AUDITING
Units 0 or 1 (flag)
Default 0 (disabled)

The auditing variable enables or disables auditing for the whole server. Turn on this parameter
if you’d like to use auditing on your system. The parameter is dynamic, and auditing will
begin as soon as you turn it on (by setting it to 1). Pay careful attention to auditing once you
turn on this parameter since space on your audit database can fill up quickly.

AUDIT QUEUE SIZE


Units Integer number of queues
Default 100
Minimum 1
Maximum 65535

The variable audit queue size determines the number of audit records that can be held up in
queue before being written to the database. Having audits in queue guarantees that if many
audit records are being recorded, the server won’t have to flush them very often. If you record
large audit trails, set this parameter to a higher value. If your audit trail is rather small, use the
default 100 records or increase it as you deem necessary. This parameter, which affects
memory allocation, takes effect only after the server is rebooted in pre-12.5 versions. As of
version 12.5, this configuration option is dynamic.

CURRENT AUDIT TABLE


Units Integer audit table number
Default 1
Minimum 0
Maximum 8

The current audit table parameter allows the keeping of the audit trail in multiple audit tables.
Having different audit tables guarantees that you can keep track of audit records with minimal
intervention. If an audit table fills up, for example, you can automatically switch to the next
table with a simple threshold procedure.

Another option includes a “rolling audit,” in which you use audit tables to keep track of
different days of the week, whereby each day is written to one table. When the first day of the
next week arrives, you can truncate that day’s table and begin the rolling audit again.

This option is dynamic and takes effect immediately upon execution of sp_configure. If you
use the with truncate option, the server truncates the table when it begins using it, and all data
stored in the table is deleted.

SUSPEND AUDITING WHEN FULL


Units 0 or 1 (flag)
Default 1

This option controls the behavior of the audit process when an audit device becomes full (if a
table does not have a threshold procedure). This value is dynamic; therefore, it takes effect
immediately upon execution of sp_configure. Values are either 1 (suspend the audit process
and all auditable user processes: default) or 0 (truncate the next audit table and start using it as
the current table).

ENABLE CIS
Units 1 or 0 (flag)
Default 0 (do not load)

This option specifies whether Component Integration Services, a feature that allows users to
access both Sybase and non-Sybase databases on different servers, is loaded when the server
is started.

Expanded Error Log Messages


LOG AUDIT LOGIN SUCCESS
Units 0 or 1 (flag)
Default 0 (disabled)

The log audit login success parameter allows the DBA to log successful logins to the error
log. This option is a hybrid of dbcc traceon 4001 and 4013:

dbcc traceon(4001)
:00000:00009:2001/05/17 23:23:40.73 server Received LOGINREC

dbcc traceon(4013)
00:00000:00008:2001/05/17 23:24:38.68 server login: sa HOST, spid: 8,
kpid: 1769499 (0x1b001b)

log audit login success


00:00000:00009:2001/05/17 23:21:20.37 Logon Login succeeded. User: sa

On Windows NT servers, the login is written to the Windows NT event log if the event
logging is enabled.

LOG AUDIT LOGON FAILURE


Units 0 or 1 (flag)
Default 0 (disabled)

The log audit login failure parameter allows the DBA to log failed logins to the error log. This
option allows you to log cases of repeated failures and take action if required.

Extended Stored Procedures


ESP EXECUTION PRIORITY
Units Integer 0-15
Default 8

This option sets the priority of the XP server thread for the execution of extended stored
procedures for the XP server. Depending on your needs, you may want to give your ESPs
higher priority. However, since the XP server resides on the same machine as your ASE
server, and ESPs are usually CPU intensive, it should be done with caution.

ESP EXECUTION STACKSIZE


Units Integer 0-15
Default Platform dependent (on most platforms it is the minimum value)
Minimum 34816
Maximum 2147483647

The ESP execution stacksize should only be increased if you have your own ESP functions
that require a larger stack size (a reserved amount of memory used for arithmetic calculations)
than the default 34816. The option sets the size of the stack, in bytes, to allocate it for
extended stored procedure execution on the XP server.

ESP UNLOAD DLL


Units 0 or 1 (flag)
Default 0 (off)

If you reuse your ESPs often and would like for them to not be unloaded from the XP server’s
memory once an ESP completes, set this option to 1. The option specifies whether DLLs that
support ESPs should automatically be unloaded from the XP server’s memory after the ESP
call has completed.

XP_CMDSHELL_CONTEXT
Units 0 or 1 (flag)
Default 0

The option xp_cmdshell_context guards the operating system from users who send operating
system commands they would not otherwise be able to send under their ASE login. If you
would like to restrict users who do not have accounts at your operating system level from
executing xp_cmdshell ESPs, set this option to 1. When the option is set to 1, a user whose
login is “Mickey,” will only be able to use an xp_cmdshell ESP if the operating system has a
login by the name of “Mickey.” Review your security needs before allowing users to execute
xp_cmdshell ESPs on your operating system.

Resource Limits
ALLOW RESOURCE LIMITS
Units 0 or 1 (flag)
Default 0 (off)

The allow resource limits parameter allows the resource governor feature, a set of database
options to control the resources available to individual users, to be used by the database
administrator. This option instructs the server to allocate internal memory for time ranges,
resource limits, and internal server alarms. It also signals the server to internally assign
applicable ranges and limits to user sessions. When this option is set to 1 (on), sp_configure
displays the optimizer’s cost estimate for a query. This option is static and takes place after
the server is rebooted.

New Configuration Parameters for the NT Platform


EVENT LOG COMPUTER NAME
Units machine name
Default value 'LocalSystem'

The event log computer name option specifies the name of the Windows NT PC that logs
your very important Adaptive Server messages in its Windows NT event log.

EVENT LOGGING
Units 0 (off) or 1 (on) (flag)
Default value 1

The event logging option controls the login of ASE messages in the Windows NT event log.
The value of 1 enables logging, while the value of 0 does the exact opposite.

START MAIL SESSION


Units 0 or 1 (flag)
Default value 0 (off)

The start mail session option controls whether a mail session is started upon ASE startup. The
default value of 0 does not start a session, while the value of 1 will cause ASE to start a mail
session upon its next startup.

SQL PERFMON INTEGRATION


Units 0 or 1 (flag)
Default value 1 (on)

The sql perfmon integration option allows the use of the Windows NT performance monitor
to monitor ASE statistics.

New Configuration Features in Sybase ASE 11.9.2


DEFAULT EXP_ROW_SIZE_PERCENT
Units Integer percent
Default 5
Minimum 0
Maximum 99
The default exp_row_size percent parameter reserves space for expanding updates in data-
only locking tables in order to reduce row forwarding. Row forwarding in DOL (data-only
locking) tables is a similar concept to page splits on allpage tables. If an update statement
increases the length of a row in a data-only locking table so that it no longer fits on the same
page, then the row is inserted onto a different page, and a pointer to the row ID on the new
page is stored in the original location for the row.

If you expect your rows to be updated and increase in size as a result, set this value higher
than the default 5 percent. If you do not expect your rows to grow in size at all, you can set
this value to as low as 0 percent, which will fill pages completely without leaving additional
room for future updates on the same page.

Row forwarding is an expensive task for the server, since it has to insert the “no longer
fitting” row to a different page and add a pointer in the original page for the new location of
the row. If you think your rows may update and grow as a result, set the value accordingly.
The parameter can only be used on DOL tables that contain variable-length columns.

Note If the value for the parameter was provided with the create table statement, watch out. If
the create table statement specified a value for the parameter, that value takes
precedence over the parameter’s new values set with sp_configure.
ENABLE HOUSEKEEPER GC
Units 0 or 1 (flag)
Default 1

The option enable housekeeper gc instructs the housekeeper task to reclaim space on DOL
tables. Reclamation of space is needed when there is unused space left on a page due to
deletions and updates, which shortens row length.

If a user deletes a row from a DOL table, for example, the space where the deleted row once
was is no longer used. The housekeeper would then check the data and index pages for the
deleted row and make the space reusable again.

This value is somewhat problematic, since it assumes that users will be using many DOL
tables with heavy deletions. When this is not the case, performance is somewhat degraded.
(Unless you have many DOL tables with heavy deletions, and your CPU is not maxed out,
change this value to 0.) Reducing the expensive overhead of the housekeeper task would help
the performance of the server.

Note In addition to buffer washing, the housekeeper periodically flushes statistics to system
tables. These statistics are used for query optimization, and incorrect statistics can
severely reduce query performance. Do not set the housekeeper free write percent to 0
on a system where data modification commands may be affecting the number of rows
and pages in tables and indexes.
LICENSE INFORMATION
Units Integer Number of licenses
Default 0
Minimum 0
Maximum 2147483647

If you need to monitor the number of licenses used on the server, you can use this
informational parameter which warns about, but does not enforce, license usage on your
server. If you have five licenses, for example, you can set the parameter to 5, and when the
value is exceeded, you will receive this notification in the error log:

WARNING: Exceeded configured number of user licenses

ASE keeps track of licenses by recording the maximum number of licenses used during a 24-
hour period in the syblicenselog table. At the end of each 24-hour period, the maximum
number of licenses used during that time is added to the syblicenseslog table. The 24-hour
period begins when you start or restart your server.

LOCK HASHTABLE SIZE


Units Integer Number of hash buckets
Default 2048
Minimum 1
Maximum 2147483647

The lock hashtable size parameter stipulates the number of hash buckets used in the lock hash
table. A hash bucket is a row in an ASE internal table (hash table) which is used for the
acquiring and releasing of internal locks.

The hash table manages all row, page, and table locks and all lock requests. Each time a task
acquires a lock, the lock is assigned to a hash bucket, and each lock request checks the same
hash bucket. If the number of hash buckets is low, tasks will be forced to read the same
bucket (row) and inherently may slow down as a result. It is recommended not to set this
value lower than the default value of 2048.

If you have approximately 15,000 objects on your server that need to use locks concurrently,
the optimal size of the hash table should be 20% of that number (3000).

LOCK SCHEME
Units allpages, datapages, datarows
Default allpages

The parameter lock scheme sets the default locking scheme for your tables. If you create a
table without specifying the locking scheme you would like to use, the table will default to
your configuration parameter.

LOCK SPINLOCK RATIO


Units Integer number of lock hash buckets
Default 85
Minimum 1
Maximum 2147483647

On ASE with multiple engines, you can determine the ratio of hash buckets to a spinlock. A
hash bucket is an internal row in a Sybase internal table used to manage locks on the server. A
spinlock is a brief internal lock that prevents a process from accessing system resources being
used by another process.

The formula for finding the ratio of spinlocks to hash buckets is (lock hashtable size/lock
spinlock ratio). The default value is 26 (2048/56). If you increase the value of lock hashtable
size, be sure to increase the value of lock spinlock ratio so the ratio will remain close to or the
same as the default value of 26. Set the ratio lower by increasing lock spinlock ratio to reduce
spinlock contention.
LOCK WAIT PERIOD
Units integer number of seconds
Default 2147483647
Minimum 0
Maximum 2147483647

The parameter lock wait period allows a quick way to find out if something is wrong with
your server. This can be done by setting a time-out mechanism for locks that limit their
waiting time. If you do not want processes to wait longer than, say, five minutes to acquire
locks, you can set this parameter to 300 (seconds). If a lock is not acquired after five minutes,
the process exits with a 1205 error code and rolls back the transaction if needed.

The default value for this parameter is to return the error code after 68 years. If your process
does not manage to acquire the lock after that time period, you will receive the error code, but
most likely you will not be there to see it. It makes you wonder about Sybase engineers and
how sure they are that this actually works.

READ COMMITTED WITH LOCK


Units integer number of page locks
Default 0
Minimum 2
Maximum 2147483647

On DOL tables during select queries, you can instruct the server to hold shared locks on rows
or pages of DOL tables. This value, if left at the default 0, reduces deadlocking and improves
concurrency since it does not hold locks on outer tables of joins while rows are read from
inner tables.

ROW LOCK PROMOTION LWM


Units Number of row locks
Default 200
Minimum 2
Maximum (value of row lock promotion HWM)

The row lock promotion LWM (low water mark) parameter sets the number of locks below
which the server does not attempt to acquire a table lock on an object.

While it may be unwise to set this value very high, since locks will be exhausted quickly,
keeping this value at 200 may not be enough on some systems. Before deciding to increase
this value, check out other options such as sp_setrowlockpromote, which can set the row lock
promotion level per table and not for the entire server. If you have many queries that regularly
update hundreds of rows for several tables, it may be wise to increase this value.

ROW LOCK PROMOTION HWM


Units Integer number of row locks
Default 200
Minimum 2
Maximum 2147483647

This value instructs the server to not escalate from row locking to table locking until the
specified number of locks is obtained on a table.

ROW LOCK PROMOTION PCT


Units Integer Percentage of row locks
Default 100
Minimum 1
Maximum 100

If the number of row locks is between row lock promotion LWM and row lock promotion
HWM, this is the percentage of row locks (based on the number of rows in a table) permitted
before the server will escalate to a table lock.

Changed Configuration Parameters

The new lock spinlock ratio parameter (see above) replaces the following configuration
parameters:

• address lock spinlock ratio


• table lock spinlock ratio
• page lock spinlock ratio

Renamed Configuration Parameters

The following configuration parameters have been renamed:

Old Name New Name


lock promotion HWM page lock promotion HWM
lock promotion LWM page lock promotion LWM
lock promotion PCT page lock promotion PCT

New Configuration Parameters for Version 12


CHECK PASSWORD FOR DIGIT
Units 0 or 1 (flag)
Default 0 (off)

A user with an SSO rule can instruct the server to check for at least one character or digit in a
password. This parameter, when set, affects only newly set passwords. This parameter raises
the security options offered by the ASE server.

DTM DETACH TIMEOUT PERIOD


Units Integer minutes
Default 0
Minimum 0
Maximum 2147483647

The parameter DTM detach timeout period sets the timeout period, in minutes, for detached
transactions. In some X/Open XA environments, a transaction may become detached from its
thread of control (usually to become attached to a different thread of control). After the time
specified by this parameter, the server rolls back the detached transaction.

DTM LOCK TIMEOUT PERIOD


Units Integer Seconds
Default 300
Minimum 1
Maximum 2147483647
The parameter DTM lock timeout period sets the maximum time, in seconds, that a
distributed transaction participant waits on a lock request. If the lock request is not satisfied
after the value is specified, the server considers the transaction to be in a deadlock, and the
lock request is timed out. This causes the entire distributed transaction to be rolled back.
Unless your transaction must be completed within a specific time period, the default five
minutes should satisfy most applications.

This parameter was put in place because distributed transactions may cause two servers to
deadlock between themselves. Since a server only identifies inter-server deadlocks,
transactions may hang if this value is not set.

ENABLE DTM
Units 0 or 1 (flag)
Default 0 (off)

This parameter enables or disables the Distributed Transaction Management feature for the
ASE server. When the DTM feature is enabled, you can use Adaptive Server as a resource
manager in X/Open XA and MSDTC systems.

ENABLE HA
Units Integer 0
Default 0 or 1 (flag)

This parameter enables the server to be configured for Sybase’s failover in a high availability
system.

ENABLE JAVA
Units 0 or 1 (flag)
Default 0 (disabled)

This parameter enables the server to be configured for Sybase’s Java services. The server
must be restarted for this option to take effect.

ENABLE XACT
Units 0 or 1 (flag)
Default 1 (on)

This parameter enables the server’s distributed transaction coordination services whereby
transactions can be executed on multiple servers. It is still possible to run remote procedure
calls between servers, but transactions are not monitored by the server, meaning, for example,
they can’t be rolled back.

MAXIMUM FAILED LOGINS


Units Integer failed login attempts
Default 0
Minimum 0
Maximum 32767

For increased security purposes, you can set a number of maximum failed logins per user. The
number is the consecutive failed login attempts permitted before the targeted login is locked.
The value is set to 0 after every successful login.

MINIMUM PASSWORD LENGTH


Units Integer
Default 6
Minimum 0
Maximum 30

This parameter sets a server-wide value for minimum password length for both logins and
roles. The minimum password length is defined as the minimum number of characters a
password must have to be accepted for any new login. This parameter, once set, affects only
new passwords.

NUMBER OF DTX PARTICIPANTS


Units Integer number of transactions
default 500
minimum 100
maximum 2147483647

The parameter number of DTX participants sets the maximum number of local and remote
distributed transaction participants that can be active at any instance of time. If you run
distributed transactions, you may want to monitor the number of DTX participants (using
sp_monitorconfig) and examine the #max ever used value. Increase the value of this
parameter if it’s close to the number of DTX participants.

SIZE OF GLOBAL FIXED HEAP


Units Integer pages
default 150 on 32 bit version, 300 on 64 bit versions
minimum 1
maximum 2147483647

The parameter size of global fixed heap determines how much memory the server will use for
internal data structures and other internal needs for Java services.

The size of global fixed heap parameter specifies the memory space for internal data
structures and other needs. This parameter is allocated in increments of 2K. If you change the
size of the global fixed heap, you must also change the total memory by the same amount.

SIZE OF PROCESS OBJECT FIXED HEAP


Units Integer Pages
Default 150 for 32 bit version, 300 for 64 bit versions
Minimum 70
Maximum 2147483647

The parameter size of process object fixed heap allocates virtual memory space for java
objects on the server during a user session. If you change this value, you must change the size
of total memory. You will need to multiply the new value by the number of user connections
and change the total memory by that amount.

New value= 100 pages (5MB)


Users= 10
New "total memory" value= original value + 50MB

SIZE OF SHARED CLASS HEAP


Units Integer Pages
default 1536 for 32 bit version, 3072 for 64 bit version
minimum 550
maximum 2147483647
This parameter specifies the shared memory space for Java classes that are called in the Java
virtual memory. If you increase this value, you must also increase the total value parameter by
the same amount of memory.

STRICT DTM ENFORCEMENT


Units 0 or 1 (flag)
default 0 (off)

The parameter strict dtm enforcement ensures that transactions are only sent to servers that
are configured to coordinate transactions. If a transaction attempts to update data in another
server that does not support transaction coordination, it is aborted. If some of your
transactions update values on servers that do not support transaction coordination (older
versions, other systems, etc.), leave this value at 0.

TEXT PREFETCH SIZE


Units Integer number of text pages
default 16
minimum 0
maximum 65535

The text prefetch size parameter limits the number of pages of text and image data that can be
prefetched into an existing buffer pool.

TXN TO PSS RATIO


Units Integer transaction descriptor
default 16
minimum 1
maximum 2147483647

The parameter txn to pss ratio determines the total number of transaction descriptors available
to the server. Transaction descriptors are internal memory structures used to manage
transactions. You can view your server’s usage of the percent of active transaction descriptors
as well as the “max ever used” value to determine if an increase may benefit your server. Use
sp_monitorconfig txn to pss ratio to view these values.

XACT COORDINATION INTERVAL (LEVEL)


Units Integer Seconds
default 60
minimum 1
maximum 2147483647

The parameter xact coordination interval defines the length of time between attempts to
resolve transaction branches that were propagated to remote servers. If a connection gets
dropped on a remote server, the server attempts a reconnection after the value specified for
this parameter.

If you have many queries that complete distributed transactions under a minute, you can lower
the value of the parameter without seeing any performance penalties. Raising this value is not
recommended.

Changes to Existing Configuration Parameters

• enable cis — the default value has been changed from 0 (off) to 1 (on).
Sybase ASE 12.5 Configuration Changes

The new configuration options in version 12.5 give the database administrator enormous
flexibility. Many parameters that were static prior to 12.5 are now dynamic. In fact, this has
been a trend over the course of releases. Dynamic means that a change in these parameters
will not require a server recycle to take effect. In addition to the dynamic configuration
options, Adaptive Server 12.5 allows the DBA to choose different memory allocation options,
depending on his/her need.

What’s New in ASE 12.5?

ASE 12.5 allows administrators more flexibility in changing parameters without restarting the
server. Since today, more than ever, there are many shops and businesses that cannot afford
downtime, ASE 12.5 provides solutions to this problem. In the past, if you added users,
databases, devices, open objects, etc., you were required to restart the server. If the business
could afford very little downtime, it meant you had to either rely on the server rebooting
correctly or actually be present for these changes (sometime around midnight). This is no
longer the case.

The main change in ASE 12.5 deals with memory parameters and how Adaptive Server
allocates memory. We explain in detail the new options and what they should be set to. Make
sure that you are certain of the consequences of the new options you set, since they are
different from all previous versions.

Upgrading

If you are upgrading, the upgrade process calculates the value for total memory, procedure
cache percent, and min online engines, and the values are inserted directly into the new 12.5
server. If the value of any of these parameters is less than the new default values, the values
are adjusted to the new default values accordingly.

Make sure to set max memory to the total memory available for ASE 12.5 on your machine.

New Configuration Parameters


MAX MEMORY (TOTAL MEMORY in Sybase 12 GA)
Units Integer Amount of total memory
Default Platform dependent
Minimum N/A
Maximum 2147483647

The new parameter max memory specifies the maximum amount of total physical memory
that is available to the server. The memory allocated in this parameter must be greater than the
memory required by the configuration options set on the server.

It is possible to configure logical memory to equal the max memory on your server if your
system is a dedicated database machine. If there are other processes that run on your system,
occupying the total memory available on the machine may disrupt their functionality. In
addition, ASE may not start if other processes occupy memory that it needs for startup.
This parameter has been made dynamic in ASE 12.5, but in most cases, it only allows the
addition of memory. If you’d like to decrease the amount of memory, you’ll have to change
the configuration options and restart the server. This does not apply to certain cases, which
we’ll discuss later.

Note Use the new global variable @@tot_physmem to check the amount of total physical
memory your 12.5 server is currently using.

Memory in ASE

Memory in ASE 12.5 is divided into logical (available memory) and physical (currently used
memory). Physical memory is the sum of the memory used by all configurable parameters
(using sp_configure). Let’s look, for example, at the number of user connections parameter:

Parameter Name Default Memory Used Config Value Run Value


-------------------------- ------- ----------- ------------ ---------
number of user connections 25 15888 200 200

The parameter has been set to 200 users who can connect to the server at any time. The
logical memory needs a little bit more than 15 MB available for these user connections. ASE
12.5 allows you to do two things:

• Allocate all 15 MB to user connections immediately upon startup.


• Allocate only the current necessary memory, depending on how many users are
connected. If only 40 users are connected, ASE will only use about 3 MB for this
configuration value.

To configure these options, a new parameter, allocate max shared memory, is used.

allocate max shared memory

The parameter allocate max shared memory (discussed later in the chapter) can be set to either
0 (default) or 1. When set to 0, the server will occupy only the amount of memory required by
the server. For example, if the parameter max memory is set to 500 MB but only 300 MB are
used (not all databases are used, not all users are logged in, not all objects are open, etc.), the
server will only occupy 300 MB of the total memory.

When this parameter is set to 1, the server will allocate the entire value specified in max
memory, (for example, the entire 500 MB in the example above).

Set the value of allocate max shared memory to 0 if your server will be set up with parameters
that are close to what you will keep. If you set this value to 0 and then configure your server,
you will see some performance degradation while ASE 12.5 initializes shared memory
segments.

Set this value to 1 if you plan to configure your server while it’s online in a production
environment. If you increase the amount of a parameter, for example, the server will already
occupy the memory and you will see no performance degradation.

Note If you decide to use value 1 for allocate max shared memory and your memory is set to
a high value but not used, you may be wasting your physical memory.

dynamic allocation on demand

The dynamic allocation on demand parameter, which is discussed later in the chapter, is a new
parameter in ASE 12.5. The parameter, if set to 1, allocates memory as needed. If you
increase the number of open databases parameter from 5 to 12, the memory that is needed for
the additional seven databases will be allocated when you create the databases and not
immediately. If the value is set to 0, the amount of memory needed for the additional seven
databases will be allocated immediately.

How Do I Decrease Memory Configuration Parameters?

You can only decrease memory on ASE 12.5 if your dynamic allocation on demand parameter
is set to 1. If it is set to 1, configured memory is not yet used and could be lowered. If the
option is set to 0, and the server immediately allocates memory for additional configuration
parameters, the only way to reduce memory is by lowering the configuration parameters and
restarting the server.

Default Data Cache

In versions prior to ASE 12.5, the value of the default data cache was the amount of memory
left after allocating cache to user configuration and the procedure cache.

In ASE 12.5, the default data cache is set by the DBA as an absolute value (i.e., 200 MB). The
default value, if you install a new ASE 12.5 version, is 8 MB. If you upgrade from an older
version, the server sets your default data cache size to the run value of the default data cache
in the configuration file of your older ASE version.

Note Use sp_configure total memory to view the total memory your server is configured to
use prior to version 12.5.
Note Use sp_cacheconfig to see the value that your default data caches are set to.
PROCEDURE CACHE SIZE
Units Integer Size of procedure cache in 2k pages
default 3271
minimum 3271
maximum 2147483647
Note Use sp_configure procedure cache size to view your current procedure configuration.
Under the Memory Used column is the amount of memory in megabytes.

The parameter procedure cache size is part of ASE 12.5’s new memory feature. While in
previous versions it was a percentage of the total memory, in the 12.5 release, it is actually
specified in 2K pages. Adaptive Server uses the procedure cache to store procedures and
compile queries while creating stored procedures.

Note If you need to set your procedure cache to 100 MB, use:

sp_configure procedure cache size 200000

The procedure cache size is specified in 2K pages.


Depending on how your server is used for queries, you will need to set this value accordingly.

If you are upgrading from a previous version, the server will set the absolute value of this
parameter according to your older version. If, for example, your older version had:

Total memory — 65 MB
Procedure cache — 20%
Total procedure cache size — 13 MB 65 * 20% = 13 MB

Your version will configure the procedure cache size parameter to 6500 pages (13 MB).

NUMBER OF ENGINES AT STARTUP


Units Integer Number of CPU’s
default 1
minimum 1
maximum (Number of CPU’s on the machine)

In pre-ASE 12.5 versions, max online engines and min online engines were used to instruct
the server how many engines it can handle. If the min online engines value was 4, you would
not have been able to take an engine offline if it crossed that number.

ASE 12.5 eliminated min online engines and changed the intent of max online engines. Now,
you can take all engines, besides engine 0, offline at anytime. The parameter max online
engines is used to set a high value of engines to be taken online at once. The value of max
online engines does not consider the number of CPUs available at startup, which means
administrators can add CPUs at a later date.

As opposed to max online engines, number of engines at startup cannot be greater than the
number of CPUs available on your machine or greater than the value of max online engines.

If you intend to bring engines online after startup, be aware that the difference between the
two parameters takes about 2 MB of memory per engine, so it is best to estimate how many
engines you’ll bring online before setting the values of these parameters.

If you do not intend to bring engines online after startup, and you use all your CPUs as
engines, you should set these two parameters to an equal value.

This option remained static in version 12.5, since it is used during startup and requires a
reboot after changing it.

Note Use the new stored procedure sp_engine to bring an engine online or offline:

sp_engine {"online" | offline} [, engine_id]

ALLOCATE MAX SHARED MEMORY


Units 0 or 1 (flag)
default 0

The parameter allocate max shared memory allows you to determine whether you want the
server to allocate all the memory specified by max memory or only the amount of memory
necessary for your current configuration.
By leaving the parameter at 0, you ensure that the server uses only the amount of memory
currently required by the server. Setting this parameter to 1 means that the server allocates all
of the memory specified in max memory at startup.

If the value is set to 1 and you increase max memory, the server will adjust to the new value
by adding shared memory segments to include the added memory.

The downside to using all the memory required is that if you do not predict memory growth
accurately and set the max memory to a high value, total physical memory is wasted in the
process.

DYNAMIC ALLOCATION ON DEMAND


Units 0 or 1 (flag)
default 1

The parameter dynamic allocation on demand allows you to choose how memory is allocated
for dynamic configuration on your server.

If you keep the default value of 0, memory is allocated only as needed. If you increase
number of locks, which on ASE 12.5 is dynamic, the server allocates memory to new locks as
they are requested until it reaches the limit of the new value.

If you change the value to 1, all memory required for dynamic configuration is immediately
assigned to the server, even if it is not yet used.

The default option of 0 allows the server to use only the memory it requires, since it allocates
memory for dynamic configuration only when requested.

New Dynamic Configuration Options

In Sybase ASE 12.5, many of the parameters that were previously static are now dynamic.
The configuration options that are now dynamic are:

• additional network memory


• audit queue size
• cpu grace time
• deadlock pipe max messages
• default database size
• default fill factor percent
• disk i/o structures
• errorlog pipe max messages
• max cis remote connections
• memory per worker process
• number of alarms
• number of aux scan descriptors
• number of devices
• number of dtx participants
• number of Java sockets
• number of large i/o buffers
• number of locks
• number of mailboxes
• number of messages
• number of open databases
• number of open indexes
• number of open objects
• number of pre-allocated extents
• number of user connections
• number of worker processes
• open index hash spinlock ratio
• open index spinlock ratio
• open object spinlock ratio
• partition groups
• partition spinlock ratio
• permission cache entries
• plan text pipe max messages
• print recovery information
• process wait events
• size of global fixed heap
• size of process object heap
• size of shared class heap
• size of unilib cache
• sql text pipe max messages
• statement pipe max messages
• tape retention in days
• time slice
• user log cache spinlock ratio

Changed Configuration Options


TOTAL MEMORY
Units
default 60
minimum 1
maximum 2147483647

In versions prior to ASE 12.5, total memory indicated the amount of memory the server
allocated from the operating system in 2K units.

In ASE 12.5, total memory has been replaced by max memory, the maximum memory
available to the server. Total logical memory is a new read-only parameter, which displays the
total logical memory for the current configuration. In other words, it displays the total
memory used by users on the server.

Deleted Configuration Options

The following configuration options have been deleted in ASE 12.5:

• engine adjust interval


• freelock transfer block size
• max cis remote servers
• max engine freelocks
• max roles enabled per user
• min online engines
• number of languages in cache
• procedure cache percent

Chapter 15: How to Read sp_sysmon


Output
Introduction

With SQL Server system 11 and onward, Sybase introduced a new tool, system stored
procedure sp_sysmon. This tool helps monitor SQL Server performance. It should be used to
get the baseline performance profile, as it reports several tasks and the internal information of
the Adaptive Server. Based on the output received, we are able to understand the causes and
thus be able to interpret the results so that we can take further action.

sp_sysmon is a specialized system procedure used to produce a statistical report on Adaptive


Server activity. It provides an overall performance picture for SQL Adaptive Server, allowing
one to observe Adaptive Server from the inside. This output produces a number of lines.
Using the isql input and output redirect flags, the output can be saved to a file and reviewed
later. (This is a great idea for long-term tracking purposes!) The statistics cover a wide range
of Adaptive Server system activities, providing an integrated look at overall system
performance.

How Does sp_sysmon Work?

When trying to run the utility sp_sysmon, make sure what the purpose is of the execution and
how the output will help to get the information needed.

Syntax:

sp_sysmon interval [,section [,applmon]]


Note The interval should be entered in “hh:mm:ss” format.

For example, if monitoring only to check the general health of the system, then sp_sysmon
can use a time interval such as:

sp_sysmon "00:15:00"

There are several internal counters that keep track of execution results. These counters are
kept so that they can give out the information when the execution is complete. Whenever
sp_sysmon is invoked, it clears all accumulated data from internal counters. Based on the time
interval that is specified for the execution, the process enters a wait for loop until the user-
specified time interval elapses. At this time, the counters get incremented based on what is
active at the time of the execution of sp_sysmon. At the end of the interval, the procedure
reads the counters, prints out the report, and stops executing.

Note Only the processes that are active at the time of execution are updated. For example, if
the new users get connected at the time of the execution, those processes will be taken.
If the users have already connected running processes, those activities are not reflected.

Although it is wise to do sp_sysmon for spot-checking the health of the system, it may not be
wise to run it on a daily basis. There is overhead running the sp_sysmon command, which
increases with the number of CPU engines you define. However, it might be a good idea to
run whenever a server tuning is done to get a data comparison so that you can get the results
of the changes. Also, it may not be a bad idea to run sp_sysmon to investigate a behavior of
an application or stored procedure. The output generated will give a better understanding of
the changes that may be required in the existing configuration.

Note Whenever a behavior needs to be investigated, it is wise to run when the server is fully
loaded and not very active. This will give a better understanding of the various tasks and
how they behave.

When to Run sp_sysmon

To get realistic information about the server behavior, it is ideal to start the applications first
and then start sp_sysmon because that will give the data cache a chance to be filled. Also, it
depends on what the test is used to measure. For example, if the test is to measure capacity,
then it is wise to keep the server extremely busy for the entire duration of the test. If for some
reason the CPU was idle during the sample of the test, some of the values that get displayed in
the output can look very low. A typical example is data per second; if the CPU is not busy at
that instance of the test period, it may display a very low value, which may not be a true
representation.

In general, sp_sysmon produces valuable information under the following circumstances:

• Whenever a cache or pool configuration is changed


• Whenever a change is made using sp_configure
• When new queries are migrated to the applications
• Whenever SQL Server engines are increased or reduced
• Whenever new disk devices are added and if objects are assigned
• To check for contention when the server is extremely busy
• Whenever a stress test is to be performed
• When the applications are extremely slow and abnormal
• General trending and tracking

Some of the information that is derived from the output can give minute details of certain
types of queries, their indexes, how they work, and the updates. We can also get information
on how the caches are used in certain types of queries and mixed queries.

Although the information one can derive from the output can be informative, we need to
understand that this can be used only for spot-checking and not as a trend check. One should
not jump to change the configuration values based on certain result outputs that may behave
well with other types of query or application mixes. Careful analysis and thought needs to be
put into making changes based on sp_sysmon outputs because they mainly display the
behavior for that period of the sample interval.

The sp_sysmon report gives the internal system behavior that may be hard to check with
system functions or stored procedures. It is important to study the entire report to understand
the full impact of the changes before making any modifications. Sometimes removing one
performance bottleneck can affect the other. It is important to know what to tune and when to
stop tuning. Both decisions are relative, and that is why expertise is required to study the data
and analyze the situation. This chapter will help and guide you when reading the report output
and interpreting it.

How to Use the Data

sp_sysmon does not give much information for tuning a query; instead, it gives server-wide
information. It is better to not use sp_sysmon and SQL Monitor at the same time because both
use the same counters. The counters will be reset when the second process starts, and the
results will be invalid.

At every tenth of a second, the engine alarm SIGALARM interrupts handler checks for a
system engine busy signal. At every busy signal, the alarm goes off when the user is running
something. So the counter is bumped by 1 if it is not busy. At the end of the sp_sysmon
execution, dbcc monitor is called where it verifies the accumulation of the tick numbers
gathered. If there are multiple engines, there are different counters for each engine. The total
count of the engines divided by the number of engines will give the average.

sp_sysmon gives snapshot information to the time specified. So obtaining a history of the
server health requires running sp_sysmon at various time intervals when the CPU is busy. Try
to analyze when the CPU was least busy and when the CPU was most busy to get an idea of
the server’s health.

There are nine major tasks that sp_sysmon reports. They are:

• Kernel Utilization
• Task Management
• Transaction Management
• Lock Management
• Index Management
• Disk I/O Management
• Data Cache Management
• Procedure Cache Management
• Network I/O Management

Let’s analyze each task and the interpretation of the outputs.

Kernel Utilization

The sp_sysmon stored procedure reports on the activities that take place inside the Adaptive
Server, including engine busy utilization, CPU yields by the engine, network checks, and disk
I/O checks.

The following displays the output of kernel utilization.

Kernel Utilization
------------------

Engine Busy Utilization


Engine 0 0.2 %
Engine 1 0.2 %
Engine 2 1.2 %
Engine 3 0.7 %
Engine 4 1.0 %
Engine 5 0.2 %
--------- ----------- -------------
Summary Total 3.3 % Average 0.6 %

CPU Yields by Engine per sec per xact count % of total


----------------------- ------- -------- ------ ---------
-
Engine 0 50.3 2.9 3019 16.7 %
Engine 1 50.4 2.9 3023 16.7 %
Engine 2 50.2 2.9 3010 16.6 %
Engine 3 51.0 3.0 3059 16.9 %
Engine 4 49.9 2.9 2994 16.5 %
Engine 5 50.0 2.9 3001 16.6 %
----------------------- ------- -------- ------ ---------
-
Total CPU Yields 301.8 17.6 18106

Network Checks
Non-Blocking 15.9 0.9 953 100.0 %
Blocking 0.0 0.0 0 0.0 %
----------------------- ------- -------- ------ ----------
Total Network I/O Checks 15.9 0.9 953
Avg Net I/Os per Check n/a n/a 0.91501 n/a

Disk I/O Checks


Total Disk I/O Checks 35423.9 2061.5 2125434 n/a
Checks Returning I/O 18213.2 1059.9 1092790 51.4 %
Avg Disk I/Os Returned n/a n/a 0.00111 n/a
====================================================================

Engine Busy Utilization

This section gives information about how busy the engine is using percentages. This
information does not report the idle time. Based on the number of engines, the average is
calculated by the percentage of busy engines divided by the total number of engines.

However, the CPU usage values reported by the operating system tools, like the “top” in
UNIX, will differ. The operating system commands that check the CPU activity may be high
because they measure the looping activity, whereas Engine Busy Utilization does not include
the time it spends on looping since it is considered idle time. When the Adaptive Server has
no task to process, it checks the network I/O, completed disk I/O, and the tasks in the run
queue.

Note One measurement that cannot be made with this sp_sysmon output report from the
inside of the Adaptive Server is the percentage of time that Adaptive Server had control
of the CPU against the time the CPU was in use by the operating system.

This part of the report actually does help to find out how many engines are effectively used
and how many engines are required. If the CPU engine usage percentage is very high and
continues to be very high, then consider adding another engine. However, once the engine is
added, resource contention in other areas needs to be checked.
At the same time, if a few engines always give 0 percent in the Engine Busy Utilization area
and continue to do so, Adaptive Server may have more engines than it actually requires and
the server can take more workload. The following shows another scenario, where the average
percentage will be very misleading:

Engine Busy Utilization


Engine 0 97.2 %
Engine 1 0.0 %
Engine 2 0.0 %
Engine 3 0.0 %
Engine 4 0.0 %
Engine 5 0.0 %
------------ --------------- ----------------
Summary Total 97.2 % Average 16.2 %
Note You can lower the runnable process search count parameter using sp_configure to
reduce the time that Adaptive Server spends checking for the I/O while it is idle. This
will reduce the check that loops through looking for a runnable task before it yields the
CPU. Note that if it shows lower CPU usage, but overall throughput or transaction rates
don’t change, you may want to change it back.

Another thing to consider is that in an SMP environment, if there is no lock contention, the
task continues to run in the same engine. Seemingly, this may look as if other engines are not
effectively used.

If you see output as above, you probably have a single process running, which is not running
in parallel. It is useful to look at load balances in this section to help understand your system’s
load.

CPU Yields by Engine

This part of the output reports the number of times the Adaptive Server engine yields to the
operating system. % of Total data is the amount of time an engine yields to the combined
yield.

Total CPU Yields is the combination of all the data from all the engines. A high value for the
CPU Yields by Engine does indicate that the engine voluntarily yields to other processes.

Note If the Engine Busy Utilization data indicates low engine utilization, the CPU Yields by
Engine can be used to determine whether the Engine Busy Utilization really shows if the
engine is inactive or if one of the engines is not busy at all by the operating system.

Network Checks

Network Checks gives all the information about blocking and non-blocking network I/O
checks. This also gives information on the total number of I/O checks for the interval and the
average number of network I/Os per network check.

Note Adaptive Server actually has two methods to check for the network I/O blocking and
non-blocking modes. Blocking is the number of times the Adaptive Server network
checks were blocked, and non-blocking is vice versa. With the non-blocking network
I/O checks, the engine checks the network for I/O and continues to process even if the
I/O is waiting.
Whenever an engine yields to the operating system because it is not busy, it wakes once every
clock tick to look for any I/Os to be processed. Once the engine commences to process the
I/O, it blocks out other processes until this process is complete.

Note Reduce the latency period by increasing the runnable process search count parameter.
This will allow the Adaptive Server engine to loop for a longer period of time.

Total Network I/O Checks

This report gives the number of times the engine checks for incoming and outgoing packets;
this information is used with CPU Yields by Engine.

Average Network I/Os per Check

Avg Net I/Os per Check gives information on the average number of network I/Os for the
Adaptive Server. This information is only for the duration of the sample interval.

Note If the Adaptive Server engine check and network I/O check do not happen in the same
frequency, the frequency of network I/O checking can be reduced.

Total Disk I/O Checks

Total Disk I/O Checks reports the number of times engines check for disk I/O.

Adaptive Server puts any task that does an I/O to sleep as soon as it starts the I/O. This is
done because that task is waiting to be completed. Meanwhile, the engine will process other
tasks and check for completed I/Os. When it finds them, it moves the task from the sleep
queue to the run queue.

Checks Returning I/O

Whenever an engine checks for disk I/O and is successful with the search, the Checks
Returning I/O count is incremented.

This check is good only for the time of the sample interval and depends upon how busy the
engine is at the sampling time.

Note If the results of this count seem a little low, the I/O polling process count can be
increased to get more checks.

Tuning the parameter I/O polling process count will affect both response time and
throughput. This parameter is tunable using sp_configure.

Average Disk I/Os Returned

Avg Disk I/Os Returned reports the average number of disk I/Os returned over all Adaptive
Server engine checks combined.
The Adaptive Server engine wait is tunable and can be increased to make sure the Adaptive
Server engine does more processing than checking for I/O. Also, the I/O polling process count
can be increased so that it can increase the length of the checking loop.

Note The I/O polling tuning option can improve the checks and count on some events but
may decrease the response and throughput on the overall system. Careful analysis needs
to be taken when trying to change that parameter. It is always based on the information
you are looking for from the system.

Worker Process Management

This section reports the number of worker processes that were granted and denied and the
success and failure of memory requests for worker processes.

In order to see the configuration parameter number of worker processes, the Max Ever Used
During Sample can be set. Memory for worker processes is allocated from the memory pool
configured using the parameter memory per worker process.

Avg Mem Ever Used by a WP

This row reports the maximum average memory used by all active worker processes at any
time during the sample interval. Each worker process requires memory, principally for
exchanging coordination messages. Adaptive Server allocates this memory from the global
memory pool. The size of the pool is determined by multiplying the two configuration
parameters, number of worker processes, and memory per worker process. If the number of
worker processes is set to 50 and memory per worker process is set to the default value of
1024 bytes, 50K is available in the pool. Increasing the memory for worker process to 2048
bytes would require 50K of additional memory.

At startup, static structures are created for each worker process. While worker processes are in
use, additional memory is allocated from the pool as needed and deallocated when not
needed. The average value printed is the average for all static and dynamic memory allocated
for all worker processes, divided by the number of worker processes actually in use during the
sample interval.

If a large number of worker processes are configured, but only a few are in use during the
sample interval, the value printed may be inflated due to averaging in the static memory for
unused processes.

If Avg Mem is close to the value set by memory per worker process and the number of
worker processes in Max Ever Used During Sample is close to the number configured, you
may want to increase the value of the parameter. If a worker process needs memory from the
pool, and no memory is available, the process prints an error message and exits.

Note For most parallel query processing, the default value of 1024 is more than adequate. The
exception is dbcc check storage, which can use up 1792 bytes if only one worker
process is configured. If you are using dbcc checkstorage, and the number of worker
processes is set to 1, you may want to increase memory per worker process.
Worker Process Management per sec per xact count % of
total
------------------------------------ ------- -------- ----- --
--------
Worker Process Requests
Total Requests 0.0 0.0 0
n/a

Worker Process Usage


Total Used 0.0 0.0 0 n/a
Max Ever Used During Sample 0.0 0.0 0
n/a

Memory Requests for Worker Processes


Total Requests 0.0 0.0 0 n/a
===========================================================================
=

Task Management

sp_sysmon reports performance statistics for task management in a tabular format. Each row
represents a specific activity or event by the Adaptive Server. The following represents a
partial output of sp_sysmon for task management.

Task Management per sec per xact count % of


total
----------------------------------- ------- -------- ----- ---
-------
Connections Opened 0.0 0.0 1
n/a

Task Context Switches by Engine


Engine 0 306.9 174.1 39523
100.0 %

Task Context Switches Due To:


Voluntary Yields 10.1 5.7 1299
3.3 %
Cache Search Misses 240.9 136.7 31025
78.5 %
System Disk Writes 9.1 5.2 1171
3.0 %
I/O Pacing 11.9 6.7 1526
3.9 %
Logical Lock Contention 0.5 0.3 62
0.2 %
Address Lock Contention 0.0 0.0 3
0.0 %
Log Semaphore Contention 0.1 0.0 11
0.0 %
Group Commit Sleeps 0.4 0.2 55
0.1 %
Last Log Page Writes 5.7 3.2 730
1.8 %
Modify Conflicts 0.3 0.1 34
0.1 %
I/O Device Contention 0.0 0.0 0
0.0 %
Network Packet Received 1.0 0.6 125
0.3 %
Network Packet Sent 5.4 3.1 697
1.8 %
SYSINDEXES Lookup 0.0 0.0 0
0.0 %
Other Causes 21.6 12.3 2785
7.0 %
===========================================================================
=
Transaction Profile
-------------------

Transaction Summary per sec per xact count % of


total
---------------------------------- ------- -------- ----- ---
-------
Committed Xacts 1.8 n/a 227
n/a
Transaction Detail per sec per xact count % of
total
---------------------------------- ------- -------- ----- ---
-------
Inserts
Heap Table 141.2 80.1 18189
66.8 %
Clustered Table 70.3 39.9 9053
33.2 %
---------------------------------- ------- -------- ----- ---
-------

Per Second and Per Transaction Data

The per transaction data is generally more meaningful in benchmarks or in test environments
where the workload is well defined. In the per second data, the number of transactions are
usually also well defined, making the comparison more meaningful, whereas per transaction
data is useful for determining the validity of percentage results.

Percent of Total and Count Data

In general, percentages are more useful to understand than general trends, but they are often
very misleading if the data is isolated. The meaning of the % of total data varies, depending
on the context of the event and the totals for the category. The count data is the total number
of events that occurred during the sample interval.

Per Engine Data

In most cases, per engine data for a category shows a fairly even balance of activity across all
engines, but the report may show an uneven distribution for the following circumstances:

• If the processes are fewer than the CPUs, some engines will show no activity.
• There is a huge bulk insert and other activities are low, like few low selects or low
inserts.

With the above conditions, there is a possibility of having an unbalanced network and disk
I/O, and that will reflect on the engine’s utilization.
Total or Summary Data

Summary rows in general provide an overview of the totals and the averages of the Adaptive
Server activity. The results displayed in the output may not be true when the data is skewed,
and the summary result should not be the deciding factor for the configuration change. A
typical example is if one engine is 90 percent utilized and the other engine is only 10 percent
utilized; the result will show an average of 50 percent based on the engine count in our
example. This result is misleading, as opposed to an evenly distributed activity.

The Task Context Switches by Cause report is one of the starting points for evaluating the
Adaptive Server’s performance. This sub-report of task management gives information about
why a certain task leaves the engine before completing. Thus, every event under the heading
“Task Context Switches Due to” reveals the reasons for the switches. Analyzing these reasons
can lead to an understanding of some of your performance parameters.

Voluntary Yields reports the number of times that a task completed or yielded after running
within that sampled time.

Cache Search Misses reports the number of times a task was switched out while performing a
physical read after a cache was missed.

Note If Cache Search Misses falls below 20%, consider adding physical memory to your
server.

System Disk Writes reports on the number of times a task was switched out while performing
a write to disk or because it needed to access a page that was being written by another
process, such as the housekeeper or a checkpoint process.

Note To keep the task from flooding the disk I/O subsystems during certain operations that
need to perform large amounts of I/O, the Adaptive Server paces the number of disk
writes it issues. Also, the checkpoints and transaction commits write a large number of
log pages.

I/O Pacing reports on the number of times a task was switched out and then slept until a batch
of writes completed. Logical Lock Contention reports on the number of times a task was
switched out while it slept because of the contention for locks on tables, data pages, or data
rows. Address Lock Contention reports on the number of times a task was switched out
because of locks on the index pages of allpages locking tables. Address lock contention
blocks access to data pages. Log Semaphore Contention reports on the number of times a task
was switched out while waiting to access the transaction log semaphore held by another task.

Note This field applies to SMP systems only. Also, high contention for log semaphore
indicates that the User Log Cache (ULC) is too small.

Group Commit Sleeps reports on the number of times a task performed a transaction commit
and was put to sleep until the log was written to disk. Last Log Page Writes indicates the
number of times a task was switched out because it was put to sleep while writing the last log
page. Modify Conflicts reports on the number of times a task was switched out while waiting
to gain access to a page tied up by a modify conflict mechanism. I/O Device Contention
reports on the number of times a task was put to sleep while waiting for the pending I/O
queue for a particular device.

Network Packet Received reports on one of two situations:

• The task received part of a multi-packet tabular data stream (TDS) batch and was
switched out while waiting for a client to send the next TDS packet of the batch.
• The task completely finished processing a command and was switched out while in a
receive sleep state waiting to receive the next command or packet from the client.

Network Packet Sent reports the number of times a task went into a send sleep state while
waiting for the network to send each TDS packet to the client. SYSINDEXES Lookup reports
on the number of times a task went to sleep waiting for another task to release the lock on the
sysindexes table. Other Causes reports the number of tasks that switched out for any reason
not previously described.

Transaction Management

The Transaction Management area in the sp_sysmon output reports all the transactions-related
activities. This also includes all activities related to user log caches (ULC).

Transaction Management
----------------------

ULC Flushes to Xact Log per sec per xact count % of


total
---------------------------------- ------- -------- ----- ---
-------
by Full ULC 0.0 0.0 0
0.0 %
by End Transaction 17.4 1.0 1044
94.4 %
by Change of Database 0.0 0.0 1
0.1 %
by System Log Record 1.0 0.1 61
5.5 %
by Other 0.0 0.0 0 0.0
%
---------------------------------- ------- -------- ----- ---
-------
Total ULC Flushes 18.4 1.1 1106

ULC Log Records 53.6 3.1 3213


n/a
Max ULC Size During Sample n/a n/a 488
n/a

ULC Semaphore Requests


Granted 108.2 6.3 6494
100.0 %
Waited 0.0 0.0 0
0.0 %
---------------------------------- ------- -------- ----- ---
-------
Total ULC Semaphore Requests 108.2 6.3 6494
Log Semephore Requests
Granted 18.7 1.1 1119
100.0 %
Waited 0.0 0.0 0
0.0 %
---------------------------------- ------- -------- ----- ---
-------
Total Log Semaphore Requests 18.7 1.1 1119

Transaction Log Writes 18.4 1.1 1102


n/a
Transaction Log Alloc 1.5 0.1 92
n/a
Avg # Writes per Log Page n/a n/a 11.97826
n/a

ULC Flushes to Xact (Transaction) Log

ULC, otherwise referred to as user log cache, gets flushed to a transaction log. This is
reported by ULC Flushes to Xact Log. The number of times the flush happened is reported in
the % of total column. This is an important part of the sp_sysmon report because it gives all
the areas in the application that cause problems with the ULC flushes. The size of ULC can be
configured in the user log cache size. Usually, there is one user log cache for every configured
user connection.

The user log cache is flushed whenever:

• The user log cache (ULC) becomes full.


• When the transaction ended either by rollback or commit.
• Whenever there is a change in database (a transaction modifies an object in a different
database).
• When an OAM page allocation happened within a user transaction.
• Other causes, which also includes writing to a disk.

Note When an activity causes a ULC flush, all the records are copied from the user log cache
to the database transaction log.

The sample report gives the total number of all ULC flushes that happened in the Total ULC
Flushes report. This is the number of flushes that occurred during the sample interval.
However, when a database has mixed data and log segments, the ULC is flushed after each
record is added.

By Full ULC

If the Adaptive Server flushes ULC more than once per transaction, the performance can be
affected. This is usually reflected in the high value of the by Full ULC. If the value is greater
than 20 percent for % of total, then the size of the user log cache size parameter should be
increased.

Note If the ULC size increases, the memory required for each user connection also increases.

By End Transaction
If the by End Transaction has a high value, it indicates that the transactions are short and
simple. Again, we need to remember that the sysmon output is only for the sample interval,
and what is reflected in the report is only for that period.

By Change of Database

Whenever there is a change in a database, the user log cache is flushed. If this value is high
and the ULC is greater than 2K, then its size can be decreased.

By System Log Record and By Other

This is another area of report output that guides in choosing the ULC size. For instance, if the
value is higher than 20 percent and the size is more than 2048, the ULC can be reduced.

Note If you have a lot of tempdb usage, you may find that this (system log record) far
outweighs other flush causes. If so, try separating data and cache on tempdb.

Total ULC Flushes

This section of the report gives the count or total number of ULC flushes that occurred during
the sample period that this report was taken.

ULC Log Records

Usually in a development environment, the frequency of the transaction log records needs to
be measured for performance. This section of the report helps to determine that, because in a
controlled development environment, the benchmark analysis can be done with the number of
log records written.

Sometimes, single records can affect many index pages. These include deletes and deferred
updates. At those instances, the log records increase with updates or deletes.

Maximum ULC Size

The count column, which gives the maximum number of bytes in any of the ULCs, can
determine the data that is reported. The configuration parameter can be set for user log cache
size. If the value in the count column continues to be lower than the value that is supplied to
the configuration parameter, it is an indication that the parameter needs to be reduced to the
count that is displayed through the output. This is because the ASE flushes all the ULC when
the transaction completes, so it will be a waste of any unused memory for ULC.

Note If the configuration parameter for user log cache size needs to be reduced, it cannot go
lower than 2048 bytes. However, if the number of flushes due to full ULC is more (by
20 percent), then the user log cache size configuration parameter should be increased.

ULC Semaphore Requests

This section reports the number of times a user task was granted to a semaphore or if it had to
wait for it. This is relevant only in SMP environments.
What is a semaphore? It is a simple internal locking mechanism that prevents a second task
from accessing the data that is currently in use. ASE handles this part of the issue by its
semaphores. Semaphores are used to protect the user log caches, since more than one process
can access the records of ULC and force a flush. The following are the subheadings from the
output:

• Granted — The number of times that a task was granted a ULC semaphore
immediately on request. There was no contention for the ULC.
• Waited — The number of times a task tried to write to ULC and encountered
semaphore contention.
• Total ULC Semaphore Requests — The total number of ULC semaphore requests
during the sample interval. This includes requests that were granted or had to wait.

Log Semaphore Requests

This section of the report is meaningful for SMP environments only. This reports the
contention for the log semaphore that protects the page.

• Granted — The number of times that a task was granted a log semaphore
immediately after it requested one. % of total reports the percentage of immediately
granted requests as a percentage of the total number of log semaphore requests.
• Waited — The number of times two tasks tried to flush ULC pages to the log
simultaneously and one task had to wait for the log semaphore. % of total reports the
percentage of tasks that had to wait for a log semaphore as a percentage of the total
number of log semaphore requests.
• Total Log Semaphore Requests — The total number of times tasks requested a log
semaphore, including those that were granted immediately and those for which the
task had to wait.

Log Semaphore Contention and User Log Caches

This part of the report displays the contention that is expected in a large number of concurrent
users committing transactions.

Transaction Log Writes

This reports the total number of times that the ASE wrote to the disk from the transaction log.
This usually happens when a transaction commits or when the current log gets full.

Transaction Log Allocations

This section reports the number of times that additional pages were allocated to the
transaction log. This is very useful for checking the rate of transaction of log growth.

Avg # Writes Per Log Page

This section reports the number of times that the log page was written to disk and is reported
in the count column.
Lock Management

Lock contention can have a large impact on the Adaptive Server. The task context switch due
to logical lock contention already has helped to identify if there are potential lock problems.

Lock Management
---------------

Lock Summary per sec per xact count % of


total
---------------------------------- ------- -------- ----- ---
-------
Total Lock Requests 1330.7 754.9 171354
n/a
Avg Lock Contention 0.5 0.3 65
0.0 %
Deadlock Percentage 0.0 0.0 0
0.0 %

Lock Detail per sec per xact count % of


total
---------------------------------- ------- -------- ----- ---
-------

Exclusive Table
Granted 5.6 3.2 725
100.0 %
Waited 0.0 0.0 0
0.0 %
---------------------------------- ------- -------- ----- ---
-------
Total EX-Table Requests 5.6 3.2 725
0.4 %
Shared Table
Granted 0.5 0.3 64
100.0 %
Waited 0.0 0.0 0
0.0 %
---------------------------------- ------- -------- ----- ---
-------
Total SH-Table Requests 0.5 0.3 64
0.0 %

Exclusive Intent
Granted 1.5 0.9 198
100.0 %
Waited 0.0 0.0 0
0.0 %
---------------------------------- ------- -------- ----- ---
-------
Total EX-Intent Requests 1.5 0.9 198
0.1 %

Shared Intent
Granted 6.3 3.5 805
98.2 %
Waited 0.1 0.1 15
1.8 %
---------------------------------- ------- -------- ----- ---
-------
Total SH-Intent Requests 6.4 3.6 820
0.5 %

Lock Summary reports on some of the overview statistics about locking activity during the
sample period. Lock Detail reports on the locks by type and also the data on whether a lock
was granted immediately and how many times a task had to wait for a particular type of lock.
Total Lock Requests reports the total number of locks that were requested.

Average Lock Contention reports on the average number of times that there was a lock
contention as a percentage of the combined lock requests. Deadlock percentage captures the
average number of deadlocks detected as a percentage of the combined lock requests.

Note Whenever I read the sp_sysmon report, I try to pay special attention to the lock
management. If the average lock contention increases over several times during the
sample taken for the day, then we need to pay closer attention to Lock Detail. Also, if
the Deadlock Percentage increases, then we need to study the deadlock by type for more
details.

There are different types of locks that can be obtained from the sp_sysmon report. They are:

• Exclusive table
• Shared table
• Exclusive intent
• Shared intent
• Last page locks on heaps
• Exclusive page
• Update page
• Shared page
• Shared and exclusive address

In the lock detail sub-report, Granted reports the number of times the lock was allocated
immediately.

Waited reports the number of times the task had to wait to acquire a lock.

Deadlocks happen when the processes wait for each other to release the lock that they are
holding. There is a deadlock detection reporting with the sp_sysmon report.

Note If there were not any deadlocks during the sample interval, it does not report anything.
Deadlock By Type displays a zero value for every column.

Deadlocks By Type reports on the specific type of the deadlock, and Total Deadlocks reports
on the total number of deadlocks

Deadlock Search reports on the number of times the Adaptive Server was able to initiate a
deadlock search. Searches Skipped reports the number of times the deadlock search had to be
skipped because the search was already in progress.

Average Deadlock Per Search reports the average number of deadlocks that were found per
search.
Note Whenever a process is blocked by lock contention, it will wait for the time interval set
by the sp_configure parameter deadlock checking period. When this interval elapses, the
process starts checking for the deadlocks. If another process is already doing the
deadlock search, the original process skips the search and waits again.

Lock Promotions reports on the number of times the escalations take place on the exclusive
page locks to an exclusive table lock and shared page locks to a shared table lock. Total Lock
Promotions reports on the average number of lock promotions per second and per transaction.
If no lock promotion occurred during the sample interval, this is the only value that is
displayed.

Index Management
Nonclustered Maintenance per sec per xact count % of
total
---------------------------------- ------- -------- ------ ---
-------
Ins/Upd Requiring Maint 20.4 1.2 12269
n/a
# of NC Ndx Maint 5.9 0.4 3535
n/a
Avg NC Ndx Maint/Op n/a n/a 0.28812
n/a

Deletes Requiring Maint 20.4 1.2 12259


n/a
# of NC Ndx Maint 5.9 0.4 3514
n/a
Avg NC Ndx Maint/Op n/a n/a 0.28665
n/a
RID Upd from Clust Split 0.0 0.0 0
n/a
# of NC Ndx Maint 0.0 0.0 0
n/a

Upd/Del DOL Req Maint 7.3 0.4 4351


n/a
# of DOL Ndx Maint 4.7 0.3 2812
n/a
Avg DOL Ndx Maint / Op n/a n/a 0.64629
n/a

Page Splits 0.0 0.0 0


n/a

Page Shrinks 0.0 0.0 0


n/a

Index Scans per sec per xact count % of


total
---------------------------------- ------- -------- ------ ---
-------
Ascending Scans 72.2 4.2 4334
99.7 %
Descending Scans 0.2 0.0 13
0.3 %
---------------------------------- ------- -------- ------ ---
-------
Total Scans 72.5 4.2 4347

Nonclustered Maintenance

When there are all inserts or all deletes with some update operations, there is a possibility for
page splits. If there is a page split, then the non-clustered index requires changes. So this
section of the report gives the number of indexes that were updated and the average number
of indexes maintained per operation.

High values for index maintenance usually indicate that we need to assess the system index
maintenance because this can affect the overall ASE performance. Also, index maintenance
requires additional I/O and locking of the index pages. The need for indexes for a table and its
usefulness should be determined with the cost of index maintenance.

Ins/Upd Requiring Maint reports the number of insert and update operations to a table with
indexes that potentially require modifications to one or more indexes. # of NC Ndx Maint
reports the number of non-clustered indexes that require maintenance as a result of insert and
update operations. Avg NC Ndx Maint/Op reports the average number of non-clustered
indexes per insert or update operation that require maintenance. For data-only locking tables,
inserts are reported in Ins/Upd Requiring Maint and deletes are reported in Upd/Del DOL Req
Maint.

Deletes Requiring Maintenance

In this section, Deletes Requiring Maint reports the number of delete operations that
potentially require modification to one or more indexes. # of NC Ndx Maint reports the
number of non-clustered indexes that require maintenance as a result of delete operations.
Avg NC Ndx Maint/Op reports the average number of non-clustered indexes per delete
operation that require maintenance.

Row ID Updates from Clustered Split

This section reports index maintenance activity caused by page splits in allpages locking
tables with clustered indexes. These splits require updating the non-clustered indexes for all
of the rows that move to the new data page.

RID Upd from Clust Split reports the total number of page splits that require maintenance of a
non-clustered index. # of NC Ndx Maint reports the number of non-clustered rows that
require maintenance as a result of row ID update operations. Avg NC Ndx Maint/Op reports
the average number of non-clustered index entries that were updated for each page split.

Update/Delete DOL Requiring Maintenance

The data in this section gives information about how updates and deletes affect indexes on
data-only locking tables.

Upd/Del DOL Req Maint reports the number of update and delete operations that potentially
require modification to one or more indexes. # of DOL Ndx Maint reports the number of
indexes that require maintenance as a result of update or delete operations. Avg DOL Ndx
Maint/Op reports the average number of indexes per update or delete operation that require
maintenance.

Page Splits

Page Splits reports the number of page splits for data pages, clustered index pages, or non-
clustered index pages when there is not enough room for a new row.

Especially with the clustered index, the row must be in physical order on the pages, and if
there is not room, ASE will split the page. This does incur overhead because the parent index
page needs to be updated. For clustered index page splits, all non-clustered indexes that point
to the rows on the new page also need to be updated.

Reducing Page Splits for Ascending Key Inserts

If Page Splits is high and your application is inserting values into an allpages locking table
with a clustered index on a compound key, it may be possible to reduce the number of page
splits through a special optimization called ascending key inserts that changes the page split
point for these indexes. The special optimization is designed to reduce page splitting and to
result in more completely filled data pages. This affects only clustered indexes with
compound keys, where the first key is already in use in the table and the second column is
based on increasing value.

Note If tables sometimes experience random inserts and have more ordered inserts during
batch jobs, it is better to enable dbcc tune (ascinserts) only for the period during which
the batch job runs.

Retries and Deadlocks

Adaptive Server has a mechanism called deadlock retries that attempts to avoid transaction
rollbacks caused by index page deadlocks. Retries reports the number of times Adaptive
Server used this mechanism.

Deadlocks on index pages can happen whenever both the transactions need to hold a lock on
the same index page. At this time, the transaction would have been updated already and
rolling back can be an overhead. However, deadlocks caused by page splits and shrinks can
drop the index locks and restart the index scan. This will allow both the transactions to
succeed.

By doing this, the other transaction will usually complete by the time the index page that
needs a page split is reached.

Note By default, any index deadlock that is due to a page split or shrink will be tried up to
five times before the transaction is considered deadlocked and is rolled back.

A high number of index deadlocks and deadlock retries indicate high contention in a small
area of the index B-tree.
Page Shrinks

Page Shrinks reports the number of times that deleting index rows caused the index to shrink.
Shrinks incur overhead due to locking in the index and the need to update pointers on adjacent
pages.

Note If the count value is greater than 0, there may be many pages in the index with fairly
small numbers of rows per page due to delete and update operations.

Consider rebuilding the index if the page shrinks are very high.

Index Scans

The Index Scans section reports forward and backward scans by lock scheme.

Ascending Scans reports the number of forward scans on allpages locking tables. DOL
Ascending Scans reports the number of forward scans on data-only locking tables.
Descending Scans reports the number of backward scans on allpages locking tables. DOL
Descending Scans reports the number of backward scans on data-only locking tables.

Disk I/O Management

In this report, there are four potential disk problem areas that are unknown to the user unless
sp_sysmon reports it within the sample interval. They are:

• Outstanding I/Os
• Requested I/Os
• Delayed I/Os
• Master device and named device activity

Disk I/O Management


----------------------

Max Outstanding I/Os per sec per xact count % of


total
---------------------------------- ------- -------- ----- ---
-------
Server n/a n/a 0 n/a
Engine 0 n/a n/a 1 n/a
Engine 1 n/a n/a 1 n/a
Engine 2 n/a n/a 4 n/a
Engine 3 n/a n/a 2 n/a
Engine 4 n/a n/a 2 n/a
Engine 5 n/a n/a 3 n/a

I/Os Delayed by
Disk I/O Structures n/a n/a 0
n/a
Server Config Limit n/a n/a 0
n/a
Engine Config Limit n/a n/a 0
n/a
Operating System Limit n/a n/a 0 n/a
Total Requested Disk I/Os 20.2 1.2 1210

Completed Disk I/Os


Engine 0 0.1 0.0 7 0.6
%
Engine 1 0.1 0.0 4 0.3
%
Engine 2 1.2 0.1 72 6.0
%
Engine 3 17.1 1.0 1028 85.0
%
Engine 4 1.6 0.1 94 7.8
%
Engine 5 0.1 0.0 5 0.4
%
---------------------------------- ------- -------- ----- ---
-------
Total Completed I/Os 20.2 1.2 1210

Device Activity Detail


----------------------------------
Device:
/dev/vg00/rlv030.sp01.sysprocs
sysprocsdev per sec per xact count % of
total
---------------------------------- ------- -------- ----- ---
-------
Total I/Os 0.0 0.0 0
n/a
---------------------------------- ------- -------- ----- ---
-------
Total I/Os 0.0 0.0 0
0.0 %
===========================================================================
=

The information in the Max Outstanding I/Os area can help configure I/O parameters at the
server or operating system level if any of the I/Os Delayed By values are non-zero.

Max Outstanding I/Os

The Max Outstanding I/Os report on the maximum number of I/Os pending that the Adaptive
Server as a whole and each Adaptive Server engine experienced at any point within that
sampling period. When the system exceeds the number of available disk I/O control blocks,
I/O is delayed because the Adaptive Server requires that tasks obtain a disk I/O control block
before initiating an I/O request.

Disk I/O Structures reports on the number of I/Os delayed by reaching the limit on disk I/O
structures. There are some operating systems that limit the number of asynchronous disk I/Os
per system or per process. If an application exceeds this limit, the OS returns an error
message. Since this impacts performance, it is inefficient for Adaptive Server to attempt
performing an asynchronous I/O if the OS is going to reject it.

Server Config Limit reports the number of I/Os delayed until enough outstanding I/Os have
completed to fall below the max asynchronous I/Os per server limit. Engine Config Limit
reports the number of I/Os delayed until enough outstanding I/Os have completed to fall
below the max asynchronous I/Os per engine limit.

The OS kernel has a per process and a per system limit on the maximum number of
asynchronous I/Os that either a process or the entire system can have pending at any point of
time. Operating System Limit reports on the number of I/Os delayed because the system has
exceeded the operating system limit.

The Requested and Completed disk I/Os values can be affected by two factors:

• I/Os requested before the sample interval began and completed during the sample
interval
• I/Os requested during the sample interval but not completed before the interval ended

In both of these cases, there is bound to be a difference in the result set. It is necessary to
create more sample reports to get a more accurate output.

Total Requested Disk I/Os reports the number of times that the Adaptive Server requested
disk I/Os. Completed Disk I/Os (by the engine) reports on the % of total, where the number of
times each Adaptive Server engine completed I/Os is given as a percentage of the total
number of I/Os completed by all Adaptive Server engines combined. Total Completed I/Os
reports on the number of disk I/Os completed by all the Adaptive Server engines.

Device Activity Detail

Every device has its own queue or pending I/O. Anytime the Adaptive Server performs I/O, it
gives the task the spinlock called the semaphore. This is for every device list and links the
structure onto the queue. It is possible to have multiple Adaptive Server engines trying to post
I/Os to the same device simultaneously, which can create contention for that semaphore.

There is a sub-report on the reads for every device. Reads reports on the number of reads to
the master or a named device. Writes reports on the number of writes to the master or a named
device. Total I/Os reports the combined number of reads and writes to a master or named
device. Device Semaphore Granted reports on the number of times that a request for a device
spinlock was granted immediately. Device Semaphore Waited reports on the number of times
that a requested device spinlock was busy and the task had to wait for the spinlock to be
released.

Data Cache Management

In this report, the sp_sysmon reports on:

• Spinlock contention
• Utilization
• Cache searches, including hits and misses
• Pool turnover for all configured pools
• Buffer wash behavior, including buffers passed clean, buffers already in I/O, and
buffers washed dirty
• Prefetch requests performed and denied
• Dirty read page requests
The data can be further analyzed using sp_cacheconfig and sp_helpcache.

Data Cache Management


---------------------

Cache Statistics Summary (All Caches)


-------------------------------------
per sec per xact count % of total
------- -------- ----- ---------
-

Cache Search Summary


Total Cache Hits 560.1 32.6 33606
97.9 %
Total Cache Misses 12.1 0.7 723
2.1 %
---------------------------------- ------- -------- ----- ---
-------
Total Cache Searches 572.2 33.3 34329

Cache Turnover
Buffers Grabbed 1.5 0.1 92
n/a
Buffers Grabbed Dirty 0.0 0.0 0
0.0 %

Cache Strategy Summary


Cached (LRU) Buffers 554.4 32.3 33266
100.0 %
Discarded (MRU) Buffers 0.0 0.0 0
0.0 %

Large I/O Usage


Large I/Os Performed 0.8 0.0 46
100.0 %
Large I/Os Denied 0.0 0.0 0
0.0 %
---------------------------------- ------- -------- ----- ---
-------
Total Large I/O Requests 0.8 0.0 46

Large I/O Effectiveness


Pages by Lrg I/O Cached 0.0 0.0 0
n/a

Asynchronous Prefetch Activity 0.0 0.0 0


n/a

Other Asynchronous Prefetch Statistics


APFs Used 0.0 0.0 0
n/a
APF Waits for I/O 0.0 0.0 0
n/a
APF Discards 0.0 0.0 0
n/a

Dirty Read Behavior


Page Requests 2.3 0.1 139
n/a
---------------------------------------------------------------------------
-
Cache: default data cache per sec per xact count % of
total
---------------------------------- ------- -------- ----- ----
------
Spinlock Contention n/a n/a n/a
0.0 %

Utilization n/a n/a n/a


100.0 %

Cache Searches
Cache Hits 560.1 32.6 33606
97.9 %
Found in Wash 2.2 0.1 130
0.4 %
Cache Misses 12.1 0.7 723
2.1 %
---------------------------------- ------- -------- ----- ---
-------
Total Cache Searches 574.4 33.4 34459

Pool Turnover
2 KB Pool
LRU Buffer Grab 1.5 0.1 92
100.0 %
Grabbed Dirty 0.0 0.0 0
0.0 %
---------------------------------- ------- -------- ----- ---
-------
Total Cache Turnover 1.5 0.1 92

Buffer Wash Behavior


Statistics Not Available - No Buffers Entered Wash Section Yet

Cache Strategy
Cached (LRU) Buffers 554.4 32.3 33266
100.0 %
Discarded (MRU) Buffers 0.0 0.0 0
0.0 %

Large I/O Usage


Large I/Os Performed 0.8 0.0 46
100.0 %
Large I/Os Denied 0.0 0.0 0
0.0 %
---------------------------------- ------- -------- ----- ---
-------
Total Large I/O Requests 0.8 0.0 46

Large I/O Detail


4 KB Pool
Pages Cached 0.0 0.0 0
n/a
Pages Used 0.0 0.0 0
n/a
16 KB Pool
Pages Cached 0.0 0.0 0
n/a
Pages Used 0.0 0.0 0
n/a

Dirty Read Behavior


Page Requests 2.3 0.1 139
n/a

Cache Search Summary

This part of the report provides summary information about cache management, such as the
number of cache hits and misses.

Total Cache Hits reports the number of times that a needed page was found in any cache. % of
total reports the percentage of cache hits as a percentage of the total number of cache
searches.

Total Cache Misses reports the number of times that a needed page was not found in a cache
and had to be read from disk. % of total reports the percentage of times that the buffer was not
found in the cache as a percentage of all cache searches.

Total Cache Searches reports the total number of cache searches, including hits and misses for
all caches combined.

Cache Turnover

This section provides a summary of cache turnover.

Buffers Grabbed reports the number of buffers that were replaced in all of the caches. The
count column represents the number of times that Adaptive Server fetched a buffer from the
LRU end of the cache, replacing a database page. If the server was recently restarted, and if
all the buffers are empty, reading a page into an empty buffer is not counted here. Buffers
Grabbed Dirty reports the number of times that fetching a buffer found a dirty page at the
LRU end of the cache and had to wait while the buffer was written to disk. If this value is not
zero, it can indicate a serious performance hit.

Cache Strategy Summary

Cached (LRU) Buffers reports the total number of buffers placed at the MRU/LRU chain in
all caches. Discarded (MRU) Buffers reports the total number of buffers in all caches after the
fetch-and-discard strategy.

Large I/O Usage

This section provides summary information about the large I/O requests in all caches.
Usually, if Large I/Os Denied is high, the individual caches need to be investigated.

Large I/Os Performed measures the number of times that the requested large I/O was
performed. % of total is the percentage of large I/O requests performed as a percentage of the
total number of I/O requests made. Large I/Os Denied reports the number of times that large
I/O could not be performed. % of total reports the percentage of large I/O requests denied as a
percentage of the total number of requests made. Total Large I/O Requests reports the number
of all large I/O requests (both granted and denied) for all caches.

Large I/O Effectiveness

This section of the report discusses the effectiveness of large I/O. This compares the number
of pages that were brought into the cache by a large I/O to the number of pages actually in the
data cache.

If Pages by Lrg I/O Used is low, it means that few of the pages brought into the cache are
accessed by the queries.

Note If investigation needs to be done for Large I/O Effectiveness for every table and index,
optdiag can be used.

Pages by Lrg I/O Cached gives the number of pages that were brought into all caches by
every large I/O operation. This is relevant only for the sample interval during the report.

Note A low value for this column can indicate either the allocation fragmentation in the table
storage or the caching strategy is not used correctly.

Total APFs Requested reports the total number of pages eligible to be prefetched.

APFs Issued/Denied

This section of the report gives the number of asynchronous prefetch requests issued by the
system for the duration of the sample interval.

The APFs Denied Due To heading reports on the APFs that were not issued. The APF I/O
Overloads heading reports on why the APF was denied due to disk I/O issues.

Note Take a closer look at the disk I/O management because this can be due to lack of disk
I/O structures, or it can be because of disk semaphore contention.

APF Limit Overloads indicates that the percentage of buffer pools that can be used for
asynchronous prefetch was exceeded.

APF Reused Overloads indicates that APF usage was denied due to a missed page chain or
because the buffers brought in by APF were swapped out before they could be accessed.

APF Buffers Found in Cache

This section reports how many buffers from APF look-ahead sets were found in the data
cache during the sample interval. Asynchronous prefetch tries to find a page it needs to read
in the data cache using a quick scan without holding the cache spinlock. If that does not
succeed, it then performs a thorough scan holding the spinlock.

In addition to that, three additional asynchronous prefetch statistics are reported:


• APFs Used reports the number of pages that were brought into the cache by
asynchronous prefetch and used during the sample interval.
• APF Waits for I/O reports the number of times that a process had to wait for an
asynchronous prefetch to complete.
• APF Discards indicates the number of pages that were read in by asynchronous
prefetch and discarded before they were used.

Note If the value for APFs Discards is high, it may indicate that increasing the buffer size
may help, or it may indicate that APF is bringing pages into the cache that are not
actually required.

Dirty Read Behavior

Dirty reads happen at isolation level 0. Page Requests reports the average number of pages
that were requested at isolation level 0, and the % of total column reports the percentage of
dirty reads compared to the total number of page reads.

Note There will be high overhead if the dirty read does dirty read restarts. This usually
happens if the data is already read and another process makes changes to it; then that
read becomes deallocated.

Cache Management by Cache

This section reports cache utilization for each active cache on the server. The sample output
shows results for the default data cache. The following text explains the percache statistics.

Spinlock Contention

Spinlock Contention reports the number of times an engine encountered spinlock contention
on the cache as a percentage of the total spinlock requests for that cache and had to wait. This
is used in SMP environments.

When a user task makes any changes to a cache, a spinlock denies all other tasks access to the
cache while the changes are being made. Although spinlocks are held for extremely brief
durations, they can slow performance in multi-processor systems with high transaction rates.
If spinlock contention is more than 10 percent, consider using named caches or adding cache
partitions.

Utilization

Utilization reports the percentage of searches using this cache as a percentage of searches
across all caches. This report helps to determine the over- or underutilized caches.

If you decide that a cache is not well utilized, you can eliminate the cache and allow the
cache’s objects to use the default data cache.

Note If the cache is not utilized correctly, cache bindings can be changed to balance
utilization or the cache can be resized to correspond to the utilization.
Cache Search, Hit, and Miss Information

This section reports the number of hits and misses and total number of searches for the cache.

Note The number displayed in the sp_sysmon report is always higher than the statistics io
report because sp_sysmon also gives the I/O for system tables, logs, and OAM pages.

Cache Hits

This section of the report gives the number of times finding the data in the cache was
successful during a search. However, the % of total reports the number of times the buffer
was found in the wash area.

Note If the percentage of cache is found more often in the wash area, then the wash needs to
be reduced because the wash area may be big.

Also, another thing to keep in mind is as the data crosses the wash area, the ASE will start to
write the dirty pages. That means more physical I/O. If the cache uses the fetch-and-discard
strategy for a non-APF I/O, there are more chances to find the data in the wash area.

Cache Misses

This reports that the required data was not found in the data cache. If the data was not found
in the cache, it needs to be read from the disk. This can affect the performance because the
more I/O required to read the data from the disk, the more degradation in the response.

Pool Turnover

Pool Turnover reports the number of times that a buffer is replaced from each pool in a cache.
Every cache can have up to four pools with 2K, 4K, 8K, and 16K page sizes.

LRU Buffer Grab is incremented whenever a page is replaced by another page.

Grabbed Dirty gives statistics for the number of dirty buffers that reached the LRU before
they could be written to disk. However, if the value for Grabbed Dirty is not zero, then the
wash area of the pool can be too small. Total Cache Turnover gives the number of buffers
grabbed in all pools in the cache.

Buffer Wash Behavior

The wash area should be large enough to allow I/O to be completed on the dirty buffers before
they can reach the LRU. It is good to keep checking the per second values for Buffers Already
in I/O and Buffers Washed Dirty in the Buffer Wash Behavior section.

Whenever a buffer reaches the wash marker, it can be in one of three states:

• Buffers Passed Clean — how clean the buffer was when it passed the wash area
• Buffers Already in I/O — how active the I/O was when it reached the wash area
• Buffers Washed Dirty — the number of times that a buffer entered the wash area dirty
and not already in I/O
Cache Strategy

Cached (LRU) Buffers reports the number of buffers that used normal cache strategy and
were placed at the MRU end of the cache. This includes all buffers read directly from disk
and placed at the MRU end and all buffers that were found in cache. At the completion of the
logical I/O, the buffer was placed at the MRU end of the cache.

Discarded (MRU) Buffers are buffers placed at the wash marker using the fetch-and-discard
strategy.

Large I/O Usage

Large I/O Usage gives the data about ASE prefetch requests for large I/O. Large I/Os
Performed measures the number of times that a requested large I/O was performed. Large
I/Os Denied gives the number of times that the large I/O could not be performed.

If a cache contains a large I/O pool and queries perform both 2K and 16K I/O on the same
objects, there will always be some percentage of large I/Os that cannot be performed because
pages are in the 2K pool.

Total Large I/O Requests provide summary statistics for large I/Os performed and denied.

Large I/O Detail

Large I/O Detail gives detailed information for each pool individually.

Pages Cached prints the total number of pages read into the cache. Pages Used reports the
number of pages used by a query while in cache.

Page Requests reports the average number of pages requested at isolation level 0.

The % of total output for Dirty Read Page Requests shows the percentage of dirty reads for
the total number of page reads.

Procedure Cache Management


Procedure Cache Management per sec per xact count % of
total
---------------------------------- ------- -------- ----- ----
------
Procedure Requests 9.1 0.5 548
n/a
Procedure Reads from Disk 0.0 0.0 0
0.0 %
Procedure Writes to Disk 0.0 0.0 0
0.0 %
Procedure Removals 0.0 0.0 0
n/a
===========================================================================
=

Memory Management per sec per xact count % of


total
---------------------------------- ------- -------- ----- ----
------
Pages Allocated 0.0 0.0 1
n/a
Pages Released 0.0 0.0 1
n/a

Procedure Requests reports the number of times stored procedures were executed.

Procedure Reads from Disk reports the number of times that stored procedures were read
from disk rather than found and copied in the procedure cache. % of total reports the
percentage of procedure reads from disk as a percentage of the total number of procedure
requests. If this is a relatively high number, it could indicate that the procedure cache is too
small.

Procedure Writes to Disk reports the number of procedures created during the interval. This
can be significant if application programs generate stored procedures.

Procedure Removals reports the number of times that a procedure aged out of cache.

Network I/O Management


Network I/O Management
----------------------

Total Network I/O Requests 14.5 0.8 869


n/a
Network I/O Delayed 0.0 0.0 0
0.0 %

Total TDS Packets Received per sec per xact count %


of total
---------------------------------- ------- -------- ----- ---
-------
Engine 0 0.2 0.0 11 3.8
%
Engine 1 0.6 0.0 35 12.1
%
Engine 2 0.6 0.0 35 12.1
%
Engine 3 2.6 0.2 158 54.5
%
Engine 4 0.8 0.0 49 16.9
%
Engine 5 0.0 0.0 2 0.7
%
---------------------------------- ------- -------- ----- ---
-------
Total TDS Packets Received 4.8 0.2 290

Total Bytes Received per sec per xact count % of


total
---------------------------------- ------- -------- ----- ---
-------
Engine 0 5.3 0.3 319 0.5
%
Engine 1 27.5 1.6 1652 2.4
%
Engine 2 23.3 1.4 1400 2.0
%
Engine 3 881.6 51.3 52894 76.5
%
Engine 4 214.5 12.5 12871 18.6
%
Engine 5 0.7 0.0 44 0.1
%
---------------------------------- ------- -------- ----- ---
-------
Total Bytes Received 1152.9 67.1 69180

Avg Bytes Received per Packet n/a n/a 238


n/a
---------------------------------------------------------------------------
-
Total TDS Packets Sent per sec per xact count %
of total
---------------------------------- ------- -------- ----- ---
-------
Engine 0 0.2 0.0 11 1.9
%
Engine 1 0.6 0.0 35 6.0
%
Engine 2 1.9 0.1 112 19.2
%
Engine 3 2.6 0.2 156 26.8
%
Engine 4 4.4 0.3 266 45.7
%
Engine 5 0.0 0.0 2 0.3
%
---------------------------------- ------- -------- ----- ---
-------
Total TDS Packets Sent 9.7 0.6 582

Total Bytes Sent per sec per xact count % of


total
---------------------------------- ------- -------- ----- ---
-------
Engine 0 49.9 2.9 2992 1.4
%
Engine 1 24.3 1.4 1459 0.7
%
Engine 2 687.4 40.0 41243 19.2
%
Engine 3 846.5 49.3 50789 23.7
%
Engine 4 1967.8 114.5 118069 55.0
%
Engine 5 1.3 0.1 78 0.0
%
---------------------------------- ------- -------- ----- ---
-------
Total Bytes Sent 3577.2 208.2 214630

Avg Bytes Sent per Packet n/a n/a 368


n/a
Total Network I/O Requests reports the total number of packets received and sent. If Adaptive
Server receives a command that is larger than the packet size, Adaptive Server waits to begin
processing until it receives the full command. Therefore, commands that require more than
one packet are slower to execute and take up more I/O resources.

Note The network packet size for all the connections can be configured. Some connections
can be configured for larger packet sizes.

Network I/Os Delayed reports the number of times the I/O was delayed. If this is consistently
non-zero, then the network should be looked at.

Total TDS Packets Received reports the number of TDS packets received per engine.

Total Bytes Received reports the number of bytes received per engine.

Avg Bytes Rec’d per Packet reports the average number of bytes for all packets received
during the sample interval.

Total TDS Packets Sent reports the number of packets sent by each engine and a total for the
server as a whole.

Total Bytes Sent reports the number of bytes sent by each Adaptive Server engine and the
server as a whole during the sample interval.

Avg Bytes Sent per Packet reports the average number of bytes for all packets sent during the
sample interval.

Note Certain TDS messages that are sent after the select statement can be improved by
turning. Turning off “done in proc” messages can increase throughput slightly in some
environments.

Syntax:

To turn it off:

dbcc tune (doneinproc, 0)

To turn the messages on:

dbcc tune (doneinproc, 1)

Summary

Performance monitoring using sp_sysmon should be measured at different times of the day.
The results seen in the reports are only good for the sample of the interval, so monitoring
using sp_sysmon is not a one-time execution. It needs to be run several times to actually find
out the issues by looking at the consistency of the problem. Once identified, changes need to
be applied and the same process of benchmark analysis needs to be done.
When tuning Adaptive Server, the fundamental measures of success appear as increases in
throughput and reductions in application response time. Unfortunately, tuning Adaptive
Server cannot be reduced to printing these two values. In most cases, your tuning efforts must
take an iterative approach, involving a comprehensive overview of Adaptive Server activity,
careful tuning and analysis of queries and applications, and monitoring locking and access on
an object-by-object basis.

Note Many shops run sp_sysmon for five minutes of every ten and aggregate the information.
This is great trending information and does not seem to impact performance measurably.

Chapter 16: Network Performance


Obtaining optimal performance from the network can involve tuning one or more of the
following: the operating system, network (LAN/WAN) layout, machine configurations, and
ASE configuration. The purpose of this chapter is to provide you with tips and suggestions
that help you maximize the performance of your network.

How Sybase ASE Handles Client Connections

Let’s start by reviewing how client connections are handled in ASE. Each client connection is
a process within ASE that is ultimately assigned a unique process ID — the spid. Each
connection uses the socket interfaces to communicate with ASE. A task within ASE, a
scheduler, periodically wakes up and checks the listen port to determine if any new
connection attempts have been made; the listen port corresponds to the master entry in your
interfaces file. If there is some data on the port, a network check routine is called to accept the
connection. The connection process is as follows:

• Obtain a socket from the operating system for each new connection. If the socket is
unavailable, write an error message (1605) to the server errorlog.
• Create a new virtual socket for the client connection, setting all appropriate options for
the socket.
• The connection handler accepts login packet from the client.
• If the login is successful, the socket is now in receive mode and requests (query/proc)
are accepted.
• The task is put on the runnable queue. The results from the server are returned on the
same socket file descriptor to the client.
• For each engine configured, there will be a separate network handler created as a
process within ASE.

Adaptive Server is one of the first database management systems built on a network-based
client/server architecture. The server has an event-driven scheduler and network handler along
with various internal processes. Because of this architecture, you may not see the client
process as a separate process on the host machine; however, the client connection is seen as
using a socket on the host machine. This client server communication over the socket occurs
via a series of data packets. Adaptive Server uses the Tabular Data Stream (TDS) protocol
over the network-based protocol. Most common network-based protocols are TCP/IP,
IPX/SPX, NetBios, and so on. Not all protocols are supported by Adaptive Server.
The tabular data stream (TDS) protocol has a header and data portion of the packet, so if the
network packet size is, say, 512, actual data is less than 512 bytes in the single packet. Since
this resides over the operating system network packet, the actual amount of data in each
packet is further reduced.

Due to the use of sockets, Adaptive Server does not depend upon the operating system to
schedule its tasks or client requests. It also reduces the number of processes running on the
host operating system, and this can be of substantial advantage in some situations.

It may happen that the client connection is abnormally disconnected or interrupted (similar to
pressing Ctrl+C or the Delete key or turning off the PC, which happens more times than
anyone can imagine) before application completes successfully. The occurrence of such an
event sends urgent data or out-of-band data to the socket connection. This data is always
processed ahead of regular data on the socket file descriptor. “Out-of-band” is used because it
appears that the data is coming from another socket, while it really comes on the same socket
file descriptor, except that the header field contains a special bit to indicate it is out-of-band
data.

How to Identify Network Performance Problems

Typical indications of network performance issues are:

• Queries show transient fluctuations in response time for no apparent reason. Problem
resolves itself with no change to application, query, or data structure.
• Performance and throughput are very sensitive to overall network load/activity. You
notice unexplained swings in throughput.
• Server experiences significant spikes in disconnect and timeout errors.
• Application processes begin to fail due to time slice errors.

As with other performance and tuning, care should be taken while tuning network
performance problems, since there are often multiple potential causes for every real problem.
You must try to determine whether the source of the problem is the operating system, the
network, ASE, the application itself, or some combination of them. Try to get as much
detailed information as possible. Take a look at the actual SQL code involved; maybe it is just
poorly written or has recently been recompiled. The database structures involved, especially
the indices, run an optdiag to check how current the statistics are and look for patterns. If the
problem only occurs at specific times or on certain days, have an analysis done on the
network identifying what is running at that time. Have your network people do an analysis on
the total network capacity as well as what’s actually available at that time. You will be
surprised what can happen to your performance when a brand new data feed is introduced into
an already capacity constrained network. Has your performance changed right after some
network maintenance was done? Check how many routers are between the server and the
client. When was the last time the server’s network interface cards (NIC) were checked?
Always work very closely with your network people to leverage their expertise and
knowledge of the infrastructure.

Possible Causes of Network Performance Problems

There are some network configuration parameters included in Adaptive Server in addition to
those parameters in the operating system. It is not my intention to go into the details of all
operating system-related parameters; however, the most common configuration parameters
that can have an effect on performance are mentioned here. Before tuning network
performance, make sure other performance-related issues are checked and that you completed
all possible adjustments in those areas. Changing network-related parameters on the OS may
affect other processes running on the same machine, so changes should be made carefully. If
Sybase ASE is the main or only process running on the machine, then you have more freedom
to experiment. Make sure that parameters are changed one at a time to find out the positive
and/or negative impact of the change. Clearly document the change made and the impact on
performance. It would be better to take a few samples to average out results.

The network can be checked using various tools available on a particular platform. On most
of the UNIX machines and on NT, the simplest way to check some of network problems is to
run the netstat command as netstat -i 20:

shashi% netstat -i 15
input hme0 output input (Total) output
packets errs packets errs colls packets errs packets errs
colls
499056570 1629 534426980 0 61507910 499683959 1629 535054369 0
61507910
64 0 70 0 0 64 0 70 0 0
73 0 42 0 0 73 0 42 0 0
73 0 42 0 0 73 0 42 0 0
71 0 25 0 0 71 0 25 0 0
69 0 26 0 0 69 0 26 0 0
52 0 15 0 2 52 0 15 0 2
61 0 25 0 0 61 0 25 0 0
70 0 39 0 1 70 0 39 0 1
63 0 22 0 0 63 0 22 0 0
62 0 15 0 0 62 0 15 0 0
188 0 135 0 6 188 0 135 0 6

• In the netstat -i 15 display, a machine with active network traffic should show both
input packets and output packets continually increasing. Here, parameter 15 indicates
a sample should be taken every 15 seconds. Based upon your particular setup, you can
change the sampling rate.
• The first row in this output shows data since the start of the host machine.
• The network collision rate is calculated by dividing the number of output collision
counts by the number of output packets. A collision rate of greater than 10 percent on
the entire network can indicate an overloaded network, a poorly configured network,
or hardware problems.
• The input packet error rate is calculated by dividing the number of input errors by the
total number of input packets.

Most likely the host machine is dropping the packets if the input error rate is more than 20
percent. Transmission problems can be caused by other hardware on the network, as well as
heavy traffic and low-level hardware problems. Bridges and routers can drop packets, forcing
retransmissions and causing degraded performance. The TCP/IP protocol guarantees delivery
of the packet, hence any increase in collision or dropping of the packet will certainly increase
load on the network, since error packets or dropped packets need to be sent again over the
network. This increases the load on the network, causing further problems with collision and
dropping of packets. This vicious circle stabilizes once the overall load on the network
reduces and the proper maintenance action is performed.

Sometimes it may be important to find out how long a round trip echo packet takes. This can
be found out on the network by typing ping -sRv servername from the client to show the route
taken by the packets.

If the round trip takes more than a few milliseconds, there are slow routers on the network or
the network is very busy. Ignore the results from the first ping command. The ping -sRv
command also displays packet losses.

The following shows the output of the ping -sRv command:

client% ping -sRv servername


PING server: 56 data bytes
64 bytes from server (129.145.72.15): icmp_seq=0. time=5. ms
IP options: <record route> router (129.145.72.1), server
(129.145.72.15), client (129.145.70.114), (End of record)
64 bytes from server (129.145.72.15): icmp_seq=1. time=2. ms
IP options: <record route> router (129.145.72.1), server
(129.145.72.15), client (129.145.70.114), (End of record)

A description of the arguments to the ping command follows:

Argument Description
s This option is for send. One packet is sent per second and one
output line is printed for every response received to the echo. No
output is produced if there is no response.
R This option is for recording route. This option is passed to IP layer,
so that the record option in IP layers is also set; this helps in
retrieving route information from IP header.
v This is the verbose option. When this option is turned on, all other
packets other than echo are also listed.

Use ping -sRv to find the response time of several hosts on the given network if you suspect a
physical problem. If the response time (ms) from one host is not what you expect, investigate
that host using tools like netstat, sniffer, snoop, or similar tools available on that particular
platform.

The ping command uses the ICMP protocol’s echo request packets to elicit an ICMP echo
response from the specified host or network gateway. It can take a long time on a time-shared
NFS server to obtain the ICMP echo. The distance from the client to the NFS server is a factor
for how long it takes to obtain the ICMP echo from the server.

These techniques are given here as a guideline since troubleshooting network problems is an
art as well as a science. It requires considerable knowledge of the underlying operating
system, hardware, network routers, and software involved to properly tune the network in a
complex environment. Again, work closely with your network people whenever you believe
that the network is the cause.

Each network is configured for a specific size, and the underlying hardware also has limits
governing the amount of data it can transmit. For example, a T1 line can carry a large amount
of data, but a fiber optic connection can carry even more. While it is possible that your
processing and data volume exceeds the capacity of your network, this is typically very rare.
Again, you should discuss this with your network support group and get some hard numbers
about the network’s capacity (bandwidth) and peak usage.

If you determine that you do, in fact, have a capacity constraint, you can take the following
corrective actions to improve performance:

• Review the default size of your network packet, which is set with the network packet
size configuration parameter. Use sp_sysmon to see whether it is sized appropriately
by comparing the packet size to the average bytes sent per packet.
• Check the setting for the tcp no delay parameter. Activating this option will increase
network traffic since packets are sent regardless of size. Normally, TCP batches small
logical packets into a larger physical packet; this is ASE’s default behavior. Check to
see what your server is set to.
• You can change network packet size on the network by changing the /etc/vfstab file on
UNIX. For example, the following entry can be added to this file:

Server:/home /home/server nfs rw rsize=2048,wsize=2048 0 0

The above line sets read and send window size to 2048 bytes.

Network performance may be affected if the client application transmits excessive amounts of
data over the network. Instead of sending the entire SQL query over the network, use stored
procedure wherever possible. This means you will only transmit the procedure name and
associated parameters over the network instead of sending the entire query over the network,
especially if the query contains the IN clause from hell. This also holds true for the data being
sent back to the client; send just what they need whenever possible.

• Changing the network parameters mentioned above can improve performance. If you
change/upgrade your OS, some default values at the OS level may also affect
performance.
• You can improve network traffic by controlling the number of connects and
disconnects that the application performs. Connection and disconnection processing
sends more data through the network and requires more activity from the ASE and the
operating system. For example, if an application issues a connection request to
Adaptive Server, it first sends a request to the operating system to get the network
address of Adaptive Server using the interfaces file entry. Once this request is
serviced, the operating system provides a socket to the application. The application
then sends login packets to Adaptive Server.
• Connection pooling can improve network performance in some situations. Evaluate
your own application, and see if connection pooling can help in your case.
• During processing of client requests, it is possible that in some environments the
packet size configured for Adaptive Server is not big enough to handle these requests.
Use sp_sysmon to evaluate your network activity during slow and busy periods and
calculate the average packet size during sampled periods. Do not increase your default
packet size unless you can prove that it is cost-effective, since changing this parameter
will consume additional memory.

In some cases, it makes sense to allocate additional network memory to support the use of
packet sizes larger than the server’s default size. There are some processes — bcp,
readtext/writetext, and OLAP queries — that definitely benefit from larger packet size. The
System Administration Guide shows how to calculate the value for this parameter, but always
ensure that the larger packets are, in fact, being used, since the memory allocated to support
these packets is wasted if they are not (your bcp is using the -A parameter, right?).

• Though this may seem like mission impossible, try to keep network activity in mind
when deciding what applications will share the same ASE server. Whenever possible,
make sure the applications sharing a server have a similar network activity profile.
Whenever possible, spread your network-intensive processes over more than one
network.
• Always be alert for the presence of 1605 and 1608 messages in your errorlog. This
could indicate a hiccup in the network, or there could be a problem either in the
network software or network layer of Adaptive Server. Again, always work closely
with your network support group.
• When calculating the network activity your application will generate, be very sensitive
to the application’s current and expected use of Replication Server, CIS, and two-
phase commits. These components will generate extensive network traffic.
• If your host machine is configured with multiple network cards, you should configure
ASE to use multiple networks. This reduces traffic on any given network and lets ASE
balance its network load. To use the multiple network, you need to add additional
“master” (listen) and “query” entries in the interfaces file or sql.ini file, depending
upon your operating system.

Tips on Understanding Network Performance and Finding Causes

Using the site handler for RPC across the remote server enables you to multiplex multiple
requests over a single socket connection. This can be beneficial for servers that don’t produce
many simultaneous RPC requests. However, servers that produce multiple concurrent RPCs
should activate the CIS RPC handling option. With this option turned on, each rpc request
creates a separate socket connection, thus minimizing socket congestion.

Integrate sp_sysmon into your network tuning. You should always gather results from
sp_sysmon at various time frames and during various loads on Adaptive Server, and make
sure that you sample during peak and non-peak periods; typical sampling intervals of five
minutes should be sufficient. When reviewing your sysmon output, pay attention to the
number of packets sent and received as well as the average bytes per packet.

In addition, there are some TCP-related parameters that are set at OS level. It has been
observed that on one particular UNIX-based operating system, a parameter’s default value has
changed between versions. Sometimes this may cause a performance problem. The particular
parameter whose default value is changed from an earlier version to a newer version is
tcp_deferred_ack_interval from 50 to 100. There are a few more TCP level parameters that
you may need to check. These are:

• tcp_close_wait_interval
• tcp_conn_req_max_q
• tcp_conn_req_max_q0
• tcp_ip_abort_interval
• tcp_keepalive_interval
• tcp_rexmit_interval_initial
• tcp_rexmit_interval_max
• tcp_rexmit_interval_min
• tcp_smallest_anon_port
• tcp_slow_start_initial
• tcp_xmit_hiwat, tcp_recv_hiwat

Not all these values are useful in improving performance. However, proper setting of these
helps you in overall performance improvement and avoiding certain unwanted non-trivial
errors.

Summary

Configuration of Adaptive Server for network performance is a delicate task. You need to
balance between overall throughput, user response time, cost effectiveness, future growth, and
changes in the application and environment.

Change network-related configuration parameters only after checking performance and


modifying other parameters. Always introduce changes in a controlled fashion — not with an
“everybody into the pool” approach — and always perform before and after monitoring.

Be cognizant of your environment and its limitation because there are always limits on what
you can achieve through tuning. You can’t tune a system beyond its physical capabilities.

Chapter 17: Performance Metrics


Overview

How do you measure performance? Unless you can quantify the results of the tweaks you
make to your system, how do you know that your changes were positive? From our very first
chapter, we stated that your performance changes had to be measurable and reproducible. So,
what do you measure? What do you look for?
In Chapter 15, we talked about how to tune and the effects of changing many of the config
parameters in relation to the monitored performance results. In this chapter, we’ll discuss
what you want to monitor on a regular basis and what some of the trends may mean.

It is a good idea to monitor your server’s performance on a periodic basis, and we recommend
a tight period. At most high performance shops I visit, we monitor for five minutes every five
minutes. Granted, that’s reminiscent of the days when sp_sysmon would only permit a five-
minute interval, but it does seem to give us great trending information. We can track that
information in the Sybase historical server, or we can use our own mechanism, whether we
use a spreadsheet to load the information or load the information into a database, and do the
type of analysis that the data warehousing people keep boasting about.

In short, we’re recommending that you run sp_sysmon very regularly, and track the output
over time. So, which output parameters are worth tracking over time? We’re looking for
trending information in a variety of areas. We’re going to categorize these into areas that
include:

• CPU (Are there enough processing cycles to handle our load as the load changes?)
• Memory management (Do we have sufficient memory, and is it being utilized
optimally?)
• I/O (Is our load balanced?)
• Network (Is our system performance being limited by a hardware resource that is
beyond the scope of our server?)

The following high-level sources to monitor are intended to give you an idea, over time, of
which resources have changing requirements and of these which need to be addressed.

CPU Utilization Percentage

This one’s easy. The beginning of each sp_sysmon report lists the CPU utilization for each
Adaptive Server engine as well as the total average CPU utilization percentage. We
recommend that you track average CPU percentage utilization for the entire server. So if you
have 12 CPUs, simply track the one number, total CPU.

If these numbers are erratic, it might make sense to take a closer look at individual sp_sysmon
report outputs. Occasionally, you’ll see discrepancies in the CPU numbers for individual
engines (for example, seven engines are at 20 percent, and one engine is at 90 percent). This
typically means that you have a long-running process and is not necessarily something to
worry about.

As your average CPU utilization increases, consider ordering more hardware when you hit 70
percent or more. Adaptive Server scales very well up to 100 percent, but you need some time
to order, receive, test, install, etc., your hardware. Seventy percent is a good number to be
running at peak, but most shops have increasing needs.

sp_sysmon output:

Kernel Utilization
------------------

Engine Busy Utilization


Engine 0 93.2 %
Engine 1 95.4 %

Total 94.3 %

Memory Management
Spinlock

Spinlock affects the entry into individual caches. When spinlock percent is higher than zero, it
means that you have contention among the individual Adaptive Server processes (CPU
engines) while trying to get into the hash tables in the cache(s). If you are consistently getting
spinlock percent higher than 5 to 10 percent in any of your caches, it is probably time to split
up the caches. Refer to the section titled “Named Cache” in Chapter 7 or see Administrator’s
Guide to Sybase ASE 12.5.

Also remember that there’s the possibility of changing the spinlock-related sp_configure
parameters.

sp_sysmon output:

---------------------------------------------------------------------------
----
Cache: default data cache
per sec per xact count % of
total
------------------------- ------------ ------------ ---------- ------
----
Spinlock Contention n/a n/a n/a
0.0 %

Context Cache Miss

There is a section in Chapter 15 that describes context switches. This describes the transition
of processes from engines. In other words, what causes an engine to let go of a process?

If this number is over 10 percent, consider more memory for your box and assign more
memory to cache; you want the majority of context switching to be caused by the end of
processing (voluntary yields).

sp_sysmon output:

Task Context Switches Due To:


Voluntary Yields 0.1 2.0 4
11.8 %
Cache Search Misses 0.0 0.0 0
0.0 %
System Disk Writes 0.1 3.5 7
20.6 %
I/O Pacing 0.0 1.0 2
5.9 %
Logical Lock Contention 0.0 0.0 0
0.0 %
Address Lock Contention 0.0 0.0 0
0.0 %
Latch Contention 0.0 0.0 0
0.0 %
Log Semaphore Contention 0.0 0.0 0
0.0 %
PLC Lock Contention 0.0 0.0 0
0.0 %
Group Commit Sleeps 0.0 0.0 0
0.0 %
Last Log Page Writes 0.1 2.0 4
11.8 %
Modify Conflicts 0.0 0.0 0
0.0 %
I/O Device Contention 0.0 0.0 0
0.0 %
Network Packet Received 0.1 1.5 3
8.8 %
Network Packet Sent 0.1 2.0 4
11.8 %
Other Causes 0.2 5.0 10
29.4 %

Total Cache Misses (or Cache Hits) and Total Cache Searches

Similar but different from the previous item, this reflects the number of requests that are made
from cache and the percentage of hits versus misses. At most shops, cache hits achieves a
steady state over time, in the neighborhood of 95 to 99 percent. It’s important to note what
state yours hits, so that as it changes, you know to begin investigating.

sp_sysmon output:

Cache Statistics Summary (All Caches)


-------------------------------------
per sec per xact count % of
total
------------ ------------ ---------- ------
----

Cache Search Summary


Total Cache Hits 2.7 82.0 164
95.3 %
Total Cache Misses 0.1 4.0 8
4.7 %
------------------------- ------------ ------------ ----------
Total Cache Searches 2.9 86.0 172

Cache Searches per Second

This gives you an idea, over time, of the number of I/O requests that are made; a cache search
is a request for a page. Increasing numbers here is not necessarily something to worry about,
but they are a clue as to the overall performance of your system.

sp_sysmon output:

---------------------------------------------------------------------------
----
Cache: default data cache
per sec per xact count % of
total
------------------------- ------------ ------------ ---------- ------
----
Spinlock Contention n/a n/a n/a
0.0 %

Utilization n/a n/a n/a


100.0 %

Cache Searches
Cache Hits 2.7 82.0 164
95.3 %
Found in Wash 0.1 3.5 7
4.3 %
Cache Misses 0.1 4.0 8
4.7 %
------------------------- ------------ ------------ ----------
Total Cache Searches 2.9 86.0 172

Deadlocks

Some shops seem plagued by deadlocks, while others seem to not notice them at all. To those
shops that are plagued by them: First, it’s normal to have some, and second, if you have a lot,
it’s probably an application problem.

This is good to track over time because then you can quantify whether they are actually
increasing or decreasing at application roll-out time. Normal numbers are a few a day to a few
a week or month.

sp_sysmon output:

Deadlocks by Lock Type per sec per xact count % of


total
------------------------- ------------ ------------ ---------- --------
--
Total Deadlocks 0.0 0.0 0
n/a

Deadlock Detection
Deadlock Searches 0.0 0.0 0
n/a

Locks

If you’ve never run into errors in your log that indicate that the number of locks has exceeded
the limit, it may be a waste of your time to monitor this.

On the other hand, if you’re suddenly changing your locking scheme to row-level locking and
don’t know how this is going to affect the number of locks, it may pay to monitor the number
of locks on your production system before and after.

The easiest way is to simply run a batch job that selects count (*) from the syslocks table
every ten minutes and store that number someplace.
I/O Management
Disk Statistics from sp_sysmon

For each disk device defined to Adaptive Server, sp_sysmon reports the amount of I/O and
the percentage of total I/O from the system which that device has used. For example, it might
report that the master device is being accessed by 9 percent of the total I/O requests made by
the server. We recommend that you track I/O for each device and track it all the time. This
can yield important information about changes in trends pertaining to how the devices are
being used.

sp_sysmon output:

Device Activity Detail


----------------------

Device:
D:\SYBASE125\data\master.dat
master per sec per xact count % of
total
------------------------- ------------ ------------ ---------- ------
----
Reads
APF 0.0 0.0 0
0.0 %
Non-APF 0.0 0.0 0
0.0 %
Writes 0.2 6.5 13
100.0 %
------------------------- ------------ ------------ ---------- ------
----
Total I/Os 0.2 6.5 13
100.0 %

Disk Statistics from iostat

If you want to know about capacity for each of the devices, you will have to go out to the
operating system. On UNIX, this is obtained by using iostat, which will tell you the
percentage of capacity for each of these physical devices.

It is important to understand the difference between the percentage number that Adaptive
Server gives you, which is the percentage of the server’s I/O that goes to that disk, and the
percentage number that iostat gives you, which is the capacity of the disk. These are two
separate pieces of important information. One tells you about the server’s needs; the other
tells you if you can get data off the disk in sufficient time.

If iostat is telling you that you’re at 90+% consistently, you may need to start spreading I/O
across more devices. If you are using file-system devices as your database devices, then you
should also check data from vmstat output. Watch for the changes in free pages and fr and po
columns under page. Fr and po column values should typically be 0. If they are large, this
means that the OS-level page stealer daemon process is swapping pages in and out of
memory, drastically impacting performance. In such a situation, you need to look at total
memory on the host machine and other OS-level configurations.
Network Management

For most purposes, the network performance is SEP (that’s “Somebody Else’s Problem” for
those unfamiliar with The Hitchhiker’s Guide to the Galaxy).

If your users are experiencing performance problems, and you find that CPU utilization is
low, cache hits are high, and physical disk is low, chances are, unless you have some long-
running queries (which you will test by hand), your bottleneck is at the network. All of the
networking people that I’ve worked with tell me to treat the network as a bottleneck.

So, how do we determine that the mouth of the bottleneck is or has become insufficient?

Network Packets Received/Sent

The only way I know of (from a server perspective) is to monitor the network statistics at the
server side. You can see what the trend is: the amount of data being passed into or retrieved
from the server.

Beware of sharp upward trends.

sp_sysmon output:

===========================================================================
====

Network I/O Management


----------------------

Total Network I/O Requests 0.1 3.5 7


n/a
Network I/Os Delayed 0.0 0.0 0
0.0 %

Total TDS Packets Received per sec per xact count % of


total
------------------------- ------------ ------------ ---------- ------
----
Engine 0 0.1 1.5 3
100.0 %
------------------------- ------------ ------------ ----------
Total TDS Packets Rec'd 0.1 1.5 3

Total Bytes Received per sec per xact count % of


total
------------------------- ------------ ------------ ---------- ------
----
Engine 0 10.5 315.5 631
100.0 %
------------------------- ------------ ------------ ----------
Total Bytes Rec'd 10.5 315.5 631

Avg Bytes Rec'd per Packet n/a n/a 210


n/a
-------------------------------------------------------------------------
----

Total TDS Packets Sent per sec per xact count % of


total
------------------------- ------------ ------------ ---------- ------
----
Engine 0 0.1 2.0 4
100.0 %
------------------------- ------------ ------------ ----------
Total TDS Packets Sent 0.1 2.0 4

Total Bytes Sent per sec per xact count % of


total
------------------------- ------------ ------------ ---------- ------
----
Engine 0 22.9 686.0 1372
100.0 %
------------------------- ------------ ------------ ----------
Total Bytes Sent 22.9 686.0 1372

Avg Bytes Sent per Packet n/a n/a 343


n/a

Chapter 18: The Audit System


Tuning the Audit System

Sybase Adaptive Server auditing provides the ability to record the occurrence of various
events on the server. Auditing can record every single event on the server, such as logins,
logouts, database access, administrative actions, inserts, deletes, server reboots, and more. For
the audit system to work effectively and efficiently, it must be properly installed and
configured. On a properly configured and tuned server with adequate capacity, enabling this
feature will have minimal impact on the overall performance of the server. If your system has
marginal resources, you may experience degradation as you increase the number of tasks you
audit.

The audit system first appeared for SQL Server version 10, allowing DBAs and system
administrators to keep track of activities on the server. Auditing is a security-related
mechanism that allows keeping an audit trail, a log of audit records, to detect such activities
as infiltration of the system, misuse of system resources, patterns of access, and general use of
the system. The following questions can be answered with an audit trail in place:

• Who deleted rows from the order table?


• Who entered the system last night at midnight?
• How many updates and deletes happen each day on my server?
• What was the value in a column before it was updated?

The audit system allows the inspection of individual logins, database objects, and database
activities, allowing several traces within the audit trails to be run concurrently.
The System Security Officer is the only user who can:

• Start and stop auditing


• Set up auditing options
• Process the audit data

Note The SSO can:

• Create logins
• Lock logins
• Display others’ login information
• Administer passwords
• Grant and revoke SSO and OPER roles
• Manage the audit system

The SSO can’t:

• Modify or drop logins

What Can Be Audited?


Server Level Database Level Object Level User Level
Logins Grant Delete cmdtext
Logouts Revoke Exec_procedure
Reboots Truncate Exec_trigger
Remote procedure calls Drop Func_obj-access
dbcc commands Load Insert
Security Create Reference
Disk Alter Select
Ad hoc bcp Update
Bind
Unbind
Dbaccess
Dump

The sybsecurity Database

The sybsecurity database is automatically created during the auditing installation process. The
sybsecurity database contains sysauditoptions, a system table that contains one row for each
server-wide audit option, and system tables for the audit trail. Other types of auditing option
settings are stored in other tables. For example, database-specific option settings are stored in
sysdatabases. When you first install the audit system, several auditing-related stored
procedures are also created in the sybsystemprocs database.

When the sybsecurity database is created during the audit install process, you are responsible
for placing it on disks. It is crucial that you build the audit system correctly for maximum
performance and for minimum interruption of server activities.
The sybsecurity database should be placed on its own disk, which, aside from auditing, is
inactive. This will allow for greater performance for the audit system. In addition to placing
the audit system on its own disk, place the sysaudits tables on their own devices and
segments. If the audit system fills up and is in place, it can cause havoc. If users are audited in
this situation, they will not be able to log into the server. Therefore, it is highly critical that the
audit tables are set up correctly.

When installing sybsecurity, you must use a separate device for the syslogs system table,
which contains the transaction log. The syslogs table, which exists in every database, contains
a log of transactions that are executed in the database .

Note Note:

You should not drop the sybsecurity database if auditing is enabled. When auditing is
disabled, only the System Security Officer can drop sybsecurity.
Tips • Install auditing tables and devices in a one-to-one ratio, whereby each table will
reside on its own device. Installing each table on its own device assures that in case
of a device failure, a device can be taken out of use by the SSO, and the other
devices that were configured for auditing will continue the audit trail with minimal
data loss.
• You can obtain additional auditing capacity by adding more auditing tables and
devices (only up to eight devices). If you install the seventh or eighth device, for
example, you should make the device considerably larger than what your audit
system currently needs, since you will no longer be able to add any additional
devices.
• It’s highly important that any sybsecurity device be a separate device from the
master device, as this disk can be hit hard.

Installing the Audit System

The audit system is usually installed with auditinit (a user interface), which is the Sybase
installation program. It can also be installed with installsecurity, which is more of a manual
installation. Use the installation method that best suits your needs. In addition, use the
documentation for your specific platform rather than generic installation documentation.

When you first install auditing, you can establish the number of audit tables you want to use
for the audit trail, the device for each audit system table, and the device for the transaction
log. All of these features are discussed later in the chapter.

Installing Auditing with installsecurity

This example assumes the use of a server that uses a logical page size of 2K.

1. Locate the installsecurity script in the $SYBASE/scripts directory.


2. Create the auditing devices and auditing database with the Transact-SQL disk init and
create database commands. For example:
3. disk init name = "auditdev1",
4. physname = "/dev/rdsk/c2t1d0s6",
5. size = "20M"
6. disk init name = "auditlogdev1",
7. physname = "/dev/rdsk/c2t1d0s7",
8. size = "4M"
9. create database sybsecurity on auditdev
log on auditlogdev

10. Use isql to execute the installsecurity script:


11. cd $SYBASE/scripts
12. setenv DSQUERY server_name
isql -Usa -Ppassword -Sserver_name -i installsecurity -o
installsecurity.out

13. Shut down and restart Adaptive Server.

When the installsecurity script has been used, the sybsecurity database is created with
only one audit table (sysaudits_01), which is created on its own segment. It is not
recommended that auditing be enabled yet. First, create at least one more table that
will allow you to use additional auditing features to avoid a table getting full. Second,
follow the advice in this chapter regarding the addition of a threshold procedure before
enabling auditing.

Installing Auditing with auditinit

To configure the Sybase server for auditing, use the auditinit program located in
$SYBASE/install/auditinit. Follow the steps to configure auditing as described on the screen
and in your specific platform’s manual.

The Audit Tables

The audit trail is stored in system tables named sysaudits_01 through sysaudits_08. If you
decide to use three audit tables, for example, you will have sysaudits_01, sysaudits_02, and
sysaudits_03. At any given moment during auditing, only one of these tables is current and all
auditing data is written to that specific table.

Note Sybase strongly recommends not using a single audit table on production systems. If
you use only a single audit table, you may lose audit records, cause the server to hang,
and cause a halt of audited users.

When you install auditing, install at least two to three tables, each on its own device, and
place them in a separate segment to allow for a threshold procedure.

Note Your audit tables are heap tables, tables without a clustered index. If your tables are
used frequently and are constantly replaced, you may want to consider partitioning the
tables.
Tips • Be sure that auditing is installed with two or more tables, each on a separate
device.
• Write a threshold procedure and attach it to each audit table segment (see an
example later in the chapter).
• Set configuration parameters for the audit queue size to indicate the appropriate
action if the current audit table becomes full.

Note Reboot the server after installing auditing.


Understanding the Audit Tables

The system audit tables can only be accessed by a System Security Officer (SSO). The SSO
can execute SQL commands to read and analyze the data in the tables using select and
truncate.

The following table lists the columns in the audit table:

Column Name Datatype Description


event smallint Type of event being
audited
eventmod smallint Additional
information about
the audited event
permission checks:

0- no modifier for
this event
1- the event passed
permission checking
2- the event failed
permission checking
spid smallint The IDs of the
processes that
caused the audit
record to be written
(server, login,
database, object)
eventtime datetime Date and time the
audited event
occurred
sequence smallint The number of
sequential records
within a single
event. Some audit
trails can require
more than one audit
for a single event.
suid smallint The server’s login
ID of the user who
performed the
audited event
dbid int null ID of the database in
which the audited
event occurred or in
which the audited
object is located
objid int null ID of the object
(table, stored
procedure, or
trigger)
xactid binary (6) null The transaction ID
containing the
audited event (from
the database where
the transaction
originated)
loginname varchar (30) null The login name of
the user
dbname varchar (30) null The database name
objname varchar (30) null The object’s name
objowner varchar (30) null Name of the object’s
owner
extrainfo varchar (255) null Additional
information about
the event

The Extrainfo Column

The extrainfo column contains a series of data “fields” separated by semicolons. The data is
organized in the following categories:

Position Category Description


1 Roles A list of active roles
separated by blanks
2 Keywords or options The name of the
keyword or option
(for example insert,
delete, exec_trigger)
that was used
3 Previous value If the event resulted
in an update, the
third position will
show the value of the
field prior to the
update.
4 Current value The current value of
the field following
the update
5 Other information Additional security
information
6 Proxy information The original login
name if a set proxy
command was used
7 Principal name The principal name
from the underlying
security mechanism
if the user’s login is
the secure default
login and the user
logged into the
server via unified
login. The value of
this item is NULL if
the secure default
login is not being
used.

The Event Column


Option Command Event (the event
column)
This event is not controlled by enable auditing: 73
a specific auditing option. sp_configure “auditing”,1

This event is not controlled by disable auditing: 74


a specific auditing option. sp_configure “auditing”,0
ad hoc user-defined audit record 1

alter alter database 2


alter table 3
bcp bcp in 4

bind sp_bindefault 6
sp_bindmsg 7
sp_bindrule 8

built-in functions built-in functions 86

create create database 9


create table 10
create procedure 11
create trigger 12
create rule 13
create default 14
sp_addmessage 15
create view 16

dbaccess access to a database by 17


any user
dbcc all dbcc commands 81

delete delete from a table 18


delete from a view 19

disk disk init 20


disk refit 21
disk reint 22
disk mirror 23
disk unmirror 24
disk remirror 25

drop drop database 26


drop table 27
drop procedure 28
drop trigger 29
drop rule 30
drop default 31
sp_dropmessage 32
drop view 33

dump dump database 34


dump transaction 35

errors fatal error 36


non-fatal error 37
exec_procedure procedure execution 38

exec_trigger trigger execution 39

func_obj_access/ access to databases and 85


func_dbaccess objects using T-SQL functions
and commands

grant grant 40

insert insert into a table 41


insert into a view 42

load load database 43


load transaction 44
login login to the server 45

logout logout from the server 46

reference any reference creation 91


to tables

revoke revoke 47

rpc remote procedure call from 48


a remote server

security server boot 50


server shutdown 51
role toggling (role being set 55
on and off)
regeneration of a password 76
by the SSO
proc_role function which is 80
executed through a system
procedure.
sp_configure 82
online database 83
valid_user 85
set proxy or set session 88
authorization
kill (CIS only) 89
connect to (CIS only) 90

select select from a table 62


select from a view 63

setuser setuser 84

table_access delete 18
insert 41
select 62
update 70

truncate truncate table 64


unbind sp_unbindefault 67
sp_unbindrule 68
sp_unbindmsg 69

update update to a table 70


update to a view 71

view_access delete 19
insert 42
select 63
update 71

Changing the Current Audit Table

The current audit table configuration parameter determines the table into which the server will
write audit rows. An SSO can change the current audit table with sp_configure using the
following syntax:

sp_configure "current audit table", n [, "with truncate"]

where n is an integer that determines the new current audit table. The valid values for n are:

• 1 means sysaudits_01, 2 means sysaudits_02, and so forth.


• 0 tells Adaptive Server to automatically set the current audit table to the next table.
For example, if your installation has four audit tables — sysaudits_01, sysaudits_02,
sysaudits_03, and sysaudits_04 — the server sets the current audit table to:
o 2 if the current audit table is sysaudits_01
o 3 if the current audit table is sysaudits_02
o 4 if the current audit table is sysaudits_03
o 1 if the current audit table is sysaudits_04

The with truncate option specifies that Adaptive Server should truncate the new table if it is
not already empty.

Note If you do not specify the with truncate option and the next table is not empty,
sp_configure will fail.

If Adaptive Server truncates the current audit table and you have not archived the data,
the table’s audit records are lost. Archive the audit data before you use the with truncate
option.

Archiving the Audit Table

The audit table can be archived using the insert command with select (insert into x select *
from y). The destination table should be in a different database, since you would like to keep
as much space available within the sybsecurity database. In addition, if you add user objects
to the sybsecurity database, you will most likely need to save the transaction log (sybsecurity
is usually set to trunc log on checkpoint), which would increase the difficulty of maintaining
it. If a threshold procedure is put in place, make sure that it can copy the data into the archive
table.

1. Create the archive database on a separate device from the one containing audit tables
in sybsecurity.
2. Make sure the select into/bulk copy database option is on in the archive database, so
you can run select into commands.
3. Create an archive table with columns identical to those in the sybsecurity audit tables.
You can do this by running select into to create an empty one by having a false
condition in the WHERE clause. For example:
4. use audit_archivedb
5. go
6. select *
7. into audit_data_table
8. from sybsecurity.dbo.sysaudits_01
where 1 = 2

The where condition in this case will always be false, so an empty duplicate of
sysaudits_01 will be created.

If you decide to use a threshold procedure, you can use insert and select data into the archive
table in the archive database. The procedure can execute commands similar to these:

insert aud_archivedb.sso_user.audit_data_table
select * from sybsecurity.dbo.sysaudits_01

Setting Up a Threshold Procedure


Note In some Sybase documents (pre-version 12), the example of the threshold stored
procedure is incorrect. The example polls for the value of current audit table using
sysconfigures instead of sysconfigs.

Before enabling auditing, a threshold procedure should be put in place to automatically switch
auditing tables when the current table is full. The threshold procedure should do the
following:

• Using sp_configure, the procedure will switch between the current audit table to the
next audit table.
• It will archive the audit table that is almost full.

This sample threshold procedure assumes that four tables are configured for auditing:

declare @audit_table_num int


/*
** Set the current audit table’s value.
*/
select @audit_table_num= scc.value
from master.dbo.syscurconfigs scc, master.dbo.sysconfigures sc
where sc.config=scc.config and sc.name = "current audit table"
/*
** Set the next audit table to be current by using the value of 0, which
automatically sets the **next sysaudits table to the "current" table
*/
exec sp_configure "current audit table", 0, "with truncate"
/*
** Copy the audit records from the audit table that became full in
sybsecurity into another archive table in a different database.
*/
if @audit_table_num= 1
begin
insert aud_db.sso_user.sysaudits
select * from sysaudits_01
truncate table sysaudits_01
end
else if @audit_table_num= 2
begin
insert aud_db.sso_user.sysaudits
select * from sysaudits_02
truncate table sysaudits_02
end
else if @audit_table_num= 3
begin
insert aud_db.sso_user.sysaudits
select * from sysaudits_03
truncate table sysaudits_03
return(0)

Attaching the Threshold Procedure to Each Audit Segment

Use the sp_addthreshold function to attach the threshold procedure to each audit table
segment.

Before executing sp_addthreshold:

• Determine the number of audit tables configured for your installation and the names of
their device segments.
• Have the permissions and roles you need for sp_addthreshold for all the commands in
the threshold procedure.

Warning sp_addthreshold and sp_modifythreshold check to ensure that only a user with sa_role
directly granted can add or modify a threshold. All system-defined roles that are active
when you add or modify a threshold are inserted as valid roles for your login in the
systhresholds table. However, only directly granted roles are activated when the
threshold procedure fires.

When Audit Data is Recorded

Let’s say that user George decides to delete rows from the NeverTouch table. The following
steps outline how the audit process records this data.

1. Someone is deleting rows from the NeverTouch table. Your manager is furious and
asks you to find out who’s deleting these rows as soon as possible.
2. User George deletes rows from the NeverTouch table, which has auditing enabled for
all delete statements.
3. The information about the deletion is stored in an audit queue, which resides in shared
memory.
4. Rows from the audit queue get copied to the disk of the current audit table.
5. Querying the audit table, you find that George was the user who deleted rows from the
NeverTouch table.
6. You report this information back to your manager, who gives you a huge bonus and
extra vacation days.

Auditing Configuration Parameters and System Procedures

The following configuration parameters manage the audit system. Each parameter is used for
a wide variety of settings within the server with the sp_configure parameter.

AUDITING
units0 or 1 (flag)
default 0 (disabled)

The auditing variable enables or disables auditing for the whole server. Turn this parameter on
if you’d like to use auditing on your system. The parameter is dynamic, and auditing will
begin as soon as you turn it on (by setting it to 1). Pay careful attention to auditing once you
turn this parameter on, since space on your audit database can fill up quickly.

AUDIT QUEUE SIZE


units Integer number of queues
default 100
minimum 1
maximum 65535

The variable audit queue size determines the number of audit records that can be held up in
queue before being written to the database. Having audits in queue guarantees that if many
audit records are being recorded, the server won’t have to flush them very often. If you record
large audit trails, set this parameter to a higher value. If your audit trail is rather small, you
can either leave the default 100 records or increase it as you deem necessary. This parameter
affects memory allocation, which takes effect only after the server is rebooted in pre-12.5
versions. As of version 12.5, this configuration option is dynamic.

CURRENT AUDIT TABLE


units Integer audit table number
default 1
Minimum 0
Maximum 8

The current audit table parameter allows the audit trail to be kept in multiple audit tables.
Having different audit tables guarantees that you can keep track of audit records with minimal
intervention. If an audit table fills up, for example, you can automatically switch to the next
table with a simple threshold procedure.

Another option includes a “rolling audit,” in which you use audit tables to keep track of
different days of the week and whereby each day is written to one table. When the first day of
the next week arrives, you can truncate that day’s table and begin the “rolling audit” again.

This option is dynamic and takes effect immediately upon execution of sp_configure. If you
use the with truncate option, the server truncates the table when it begins using it, and all data
stored in the table is deleted.

SUSPEND AUDITING WHEN FULL


units 0 or 1 (flag)
default 1
This option controls the behavior of the audit process when an audit device becomes full (if a
table does not have a threshold procedure). This value is dynamic and therefore takes effect
immediately upon execution of sp_configure. Values are either (1), which suspends the audit
process and all auditable user processes (default), or (0), which truncates the next audit table
and starts using it as the current table.

System Procedures for Auditing

Use these system procedures to manage the auditing process:

• sp_audit — Enables and disables auditing options. This is the only system procedure
required to establish the events to be audited.

Note Run sp_audit with no parameters to see a complete list of the available options.

• sp_displayaudit — Displays the active auditing options.


• sp_addauditrecord — Adds user-defined audit records (comments) into the audit
trail. Users can add these records only if a System Security Officer enables ad hoc
auditing with sp_audit.
• sp_addaudittable — Adds another system audit table after auditing is installed.

Note Beginning with ASE 11.5.x, auditing changed where the new stored procedure, sp_audit,
replaced the previous auditing procedures: sp_auditoption, sp_auditlogin, sp_auditdatabase,
sp_auditobject, and sp_auditproc. sp_addauditrecord still exists.

Managing the Audit System Transaction Log

If the trunc log on chkpt database option is active, the server truncates the log (syslogs) every
time it performs an automatic checkpoint. After auditing is installed, the value of trunc log on
chkpt is on, but you can use sp_dboption to change its value.

Warning It is a very good idea to have truncate log on checkpoint set to on in the sybsecurity
database. Otherwise, logs get full, and all processes get suspended.

Truncating the Transaction Log

If you enable the trunc log on chkpt option for the sybsecurity database, you do not need to
worry about the transaction log becoming full. The server truncates the log whenever it
performs a checkpoint. When the option is set on, you cannot dump the transaction log with
dump transaction, but you can use dump database to dump the database. If you follow the
procedures in the “Setting Up a Threshold Procedure” section, audit tables are automatically
archived to tables in another database. You can use standard backup and recovery procedures
for this archive database.

If a crash occurs on the sybsecurity device, you can reload the database and resume auditing.
Most likely, the only records lost are the ones in the queue and current audit table. In a later
section, we will discuss how large your queue should be, which also has an effect on how
much data you may lose if a crash was to occur.
Managing the Transaction Log with No Truncation

If you set the database option trunc log on chkpt to off, your transaction log will fill up and
will not be truncated until it fills up. You can use the standard last-chance threshold procedure
in the log segment to make sure your log segment does not fill up.

In the sybsecurity database, when the transaction log reaches the last-chance threshold point,
any transaction that is running is suspended until space becomes available. The suspension
occurs because the database option abort transaction when log is full is always set to FALSE
for the sybsecurity database. You cannot change this option.

Note To truncate a system audit table (sysaudits_01 to sysaudits_08), you must be a System
Security Officer (or have the SSO role).

With the trunc log on chkpt option off, you can use standard backup and recovery procedures
for the sybsecurity database as you would for any of your other databases. Be aware that the
audit tables in the restored database may not be in sync with their status at the time of a device
failure.

Suspending Auditing if Devices are Full

If you have set up your auditing system correctly, you should have two or more audit tables,
each on a separate device other than the master device, and a threshold procedure in place for
each of the audit tables segments. Then the audit devices should never become full. Only if a
threshold procedure is not functioning properly would the “full” condition occur.

If your device was to become full at some point, it could mean disaster for you. Therefore,
even though the procedures above ensure this will not happen, there are steps you can take to
decide what happens when the device does fill up.

Choose one of these options:

• Suspend the auditing process and all user processes that cause an auditable event.
Resume normal operation after a System Security Officer clears the current audit
table.
• Truncate the next audit table and start using it. This allows normal operation to
proceed without intervention from a System Security Officer.

To set this configuration parameter, use sp_configure. The syntax is:

sp_configure "suspend audit when device full", [0|1].

The option of 0 truncates the next audit table and starts using it as the current audit table
whenever the current audit table becomes full. If you set the parameter to 0, the audit process
is never suspended; however, older audit records will be lost if they have not been archived.

The 1 option (the default value) suspends the audit process and all user processes that cause
an auditable event. To resume normal operation, the System Security Officer must log in and
set up an empty table as the current audit table. During this period, the System Security
Officer is exempt from normal auditing. If the System Security Officer’s actions would
generate audit records under normal operation, Adaptive Server sends an error message and
information about the event to the errorlog. If you have a threshold procedure attached to the
audit table segments, set suspend audit when device full to 1 (on). If it is set to 0 (off),
Adaptive Server may truncate the audit table that is full before your threshold procedure has a
chance to archive your audit records.

Examples of Setting Auditing Options

Audit all commands by the sa, as well as view the entire command:

sp_audit "cmdtext", "sa", "off"

Audit all errors that occur in the mydb database:

sp_audit "errors", "all", "mydb", "on"

Audit all inserts into mytable:

sp_audit "insert", "all", "mytable", "on"

Audit all future stored procedures that may be executed:

sp_audit "exec_procedure","all","default procedure","on"

The Audit Queue

When an audited event occurs, an audit record first goes to the in-memory audit queue. The
record remains in memory until the audit process writes it to the audit trail. You can configure
the size of the audit queue with the audit queue size parameter of sp_configure. Before you
configure the size of the audit queue, consider the trade-off between the risk of losing records
in the queue if the system crashes and the loss of performance when the queue is full. As long
as an audit record is in the queue, it can be lost if the system crashes. However, if the queue
repeatedly becomes full, overall system performance is affected. If the audit queue is full
when a user process tries to generate an audit record, the process sleeps until space in the
queue becomes available.

If the queue is too large, all audit records in queue are lost upon a system crash. If the queue is
too small, it will become repeatedly full, which will affect overall system performance. In
addition, user processes that generate audit records will sleep while the audit queue is full.

The auditing queue should be as your site needs it. If the audit records in queue are not as
crucial to you as the data in your system, as well as normal user operations, set it to a larger
size. If the opposite is true, the audit queue should be relatively small.

• The memory requirement for a single audit record is 424 bytes in queue, while a
record can be as small as 22 bytes when it is written to a data page.
• The maximum number of audit records that can be lost in a system crash is the size of
the audit queue (amount of records) plus 20. The additional 20 records are records that
remain on a buffer page. These pages are only flushed to disk every 20 writes. If the
system crashes, the audit queue, as well as the buffer page, are lost.
• In the system audit tables, the extrainfo field and the fields containing names are of
variable length, so audit entries can vary in sizes. The number of audit records that can
fit on a page varies from four to 80 or more. The memory requirement for the default
audit queue size of 100 is approximately 42K. Set this value according to your site’s
ability and need.

Querying the Audit Trail

The SSO is the only user who can query the sysaudits table using select statement. To query
the audit trail, you will need to write SQL commands to select and summarize the audit data.
If you set up your audit trail correctly, the audit data is automatically archived to one or more
tables in another database. For example, assume that the audit data resides in a table called
audit_data in the audit_archivedb database. To select audit records for all deletions on July 4,
2001, execute:

use audit_archivedb
go
select * from audit_data_table
where event=18
and eventtime like "Jul 4% 01"
go

To select all audit records for tasks performed by user George on July 4, 2001:

use audit_archivedb
go
select * from audit_data_table
where loginname = "george"
and eventtime like "Aug 9% 01"
go

Querying the audit table is rather simple and requires basic SQL to extract results detailing the
information you’re looking to identify. There are other tools to extract audit data in a more
sophisticated manner, such as Rob Vershcoor’s “qextract & qreport auditing tools,” available
at http://www.euronet.nl/~syp_rob/download.html.

This tool can extract data from the sybsecurity database and reconstructs the original queries.
The original queries can then be sent back to the database to gather performance statistics.

Review of Auditing and Performance

• Use a separate disk — Audit records are written first to a queue in memory and then
to the sybsecurity database (with a buffer page in between). If the database shares a
disk used by other busy databases, it can significantly impair performance.
• Place the sysaudits tables on their own devices — If possible, place the sysaudits
tables on their own devices. If that is not possible, place them on a device that is not
used for most critical applications.
• Use a large enough queue — If the in-memory audit queue fills up, the user
processes that generate audit records sleep. In addition, general performance is slowed
down when the queue is full.
• Audit only the events you need to track — Heavy auditing slows overall system
performance.
• Set a threshold procedure — A threshold procedure will make sure that you don’t
run out of space in a table and will ensure that the server will continue to audit records
without a halt in user or server activity.
• Consider partition for heavy auditing — If there are heavy inserts into your audit
tables, and since your audit tables are heap tables (tables without a clustered index),
consider partitioning the tables.

Chapter 19: Abstract Query Plans


Overview

When Adaptive Server is given a query, it uses the optimizer to develop a plan to resolve the
query. A query plan describes the execution path the server will use to retrieve the data that is
asked for in the query or procedure. The server can develop and also save these query plans
for later use. These plans are called abstract query plans and are saved with their text in the
table sysqueryplans. These plans are called full abstract plans, which list exactly what to do
for every step of the query, or partial query plans, which list at particular steps what join
strategy to use or index or make 'hints' as to what the optimizer should pick between to arrive
at the best plan.

The server has a method for comparing incoming queries against the plans already compiled
to see if the query is in fact the same as the one it already has stored as an abstract plan.

Additionally, there are a series of administrative switches and routines that an administrator
may use to affect the behavior of the server in regard to abstract data plans and to administer
the plans.

Server administrators and performance tuners have the ability to store these plans so they
don't have to be recalculated. This saves overall execution time and allows the server to run
more efficiently because it does not have to spend time recompiling a query.

The major uses of abstract query plans are to:

• Compare query plans before and after system upgrades to judge if it will have any
effect on performance
• Allow for searching of specific types of queries, such as those using table scans or
specific indexes
• Save a plan to avoid recompiling

Abstract query plans can be saved as a partial plan or a full plan with or without hints to the
optimizer. For instance, a partial plan may suggest or tell the optimizer to use a specific index
on a table but let the optimizer figure out the rest of the plan. A full plan would outline every
step of the execution path. Abstract query plans are created using the create plan command
(discussed later in the chapter).

Abstract query plans allow the user to change the execution path of a query without having to
change the SQL statements. It is another tool to help reduce overall query overhead.
The optimizer may use many different strategies to obtain the data needed from a particular
query. Data access methods include table scans, reformatting, and index scans. Each join may
use a nested join or a merge join. These choices are made by the optimizer via its costing
algorithms. When an abstract plan is captured, these choices are saved along with query text
in the table sysqueryplans.

Another method to influence a query is using the set command, but it is limited in what it can
do. The option forceplan can make the optimizer use a specific join order, but the other
options only influence the query plan. They include index choice, cache strategy, and
parallel_degree. Not all queries can be influenced by the set command and many third-party
tools do not recognize or have a method to deal with the set commands.

Queries are stored in groups. By default, there are two groups: one for plan capture ap_stdout
and one for plan association ap_stdin. An administrator or database owner may create other
groups such as 'before' and 'after' (which could be used to compare plans from before and
after an upgrade) or a group could be created for 'testing.' Plans may be copied from one
group to another and compared. Abstract plans are generated during query compile time and
hence, no abstract plan will be generated for cached query plans, where queries bypass
optimization.

Associating a Query with a Plan

The process of attaching an incoming query with an abstract plan is called association. When
abstract plan association is turned on, an incoming query is processed and an association key
is developed and stored in the table sysqueryplans. An association key is made of a hash key,
the user ID of the current user, the group ID of the current group, and the full query text.

Any incoming query has its hash key computed, and it is compared with any of the already
stored hash keys. If a match is found, the query text is compared; if they match, the plan
already stored is used. A hash key is created by reducing all white space (tabs, returns, and
multiple spaces) to a single space and computing a hash value for the trimmed SQL statement.
Within an abstract plan group, there cannot be two query plans that have the same query text.

Abstract Query Plan Language

The following operators are used in the relational algebra of the abstract query plan:

• g_join (generic join) - High level logical operator for existence, inner, outer joins. Can
be used with nested loop or sort merge joins.
• nl_g_join (nested loop join) - Nested loop operator (inner, outer, or existence)
• m_g_join (merge join) - Merge join (inner or outer)
• union - Union and union all
• scan - Makes table a flow of rows, a derived table, no restriction on access method, a
logical operator
• i_scan - Index scan, a physical operator
• t_scan - Full table scan, a physical operator
• store - Describes materialization of table into stored work table, a logical operator
• nested - Describes placement and structure of nested queries, a filter

The following are additional keywords used in the language for grouping and identification:
• plan - Groups elements for multiple step plan
• hint - Suggestions for a partial plan
• prop (scan properties) - Prefetch, lru, mru, parallel
• table - If correlation names are used, this identifies them for subqueries or views
• work_t - Identifies work table
• in - Names table in subquery (subq) or (view); used with table operator
• subq - Identifies attachment point for subquery's abstract plan

Specifying Access Methods

Given the query:

select * from mytable


where col1 > 500 and col2 < 100

One can specify different access methods.

If an index existed on the col1 called i_col1:

(i_scan i_col1 mytable)

Using an index on col2 called i_col2:

(i_scan i_col2 mytable)

For a full table scan:

(t_scan mytable)

Full and Partial Plan Examples

In the above query, (i_scan () mytable) would specify to pick from one of the available
indexes (signified by the use of the inner parentheses). This would eliminate the possibility of
a table scan. Use of i_scan without an index name represents a partial plan.

Identifying Tables

Tables must be identified in the abstract plan in the same way they are identified in the query.
If the table uses the database and owner name, so must the abstract plan. If tables have a
correlated name, as is used in a self join, then the following syntax is used.

Given the query:

...
from mytable mta, mytable mtb

The abstract plan would be written:

(g_join
(t_scan (table (table (mta mytable)))
(t_scan (table (table (mtb mytable)))
)
In the above example, the nested table operator is used to create a link between the correlation
name and the table name.

Likewise, in subqueries the table operator is used in conjunction with the subq operator.
Given the query:

select *
from mytable
where col1 in (select col2 from mytable where col1 < 500 )

The abstract plan would be:

(g_join
(t_scan mytable)
(i_scan i_col1 (table mytable (in subq 1))))
)

When three or more tables are joined, the server joins two tables and creates derived tables,
which it joins to the third table.

You can specify the join order in the following way:

(g_join
(g_join (scan mytable3)
(scan mytable2)
)
(scan mytable1))

Here table 3 is joined to 2, and then it is joined to 1.

A shortcut notation is:

(g_join
(scan mytable3)
(scan mytable2)
(scan mytable1))

or you could specify a specific index with i_scan:

(g_join
(i_scan index_col1 mytable3)
(i_scan index_col2 mytable2)
(i_scan index_col3 mytable1)
)

Note the above plans specify the join order but not the type of join.

This notation can be useful when set forceplan will not do the job (for instance, when joining
a table to a view).

There are limits on joins and join order. For example, an outer join must use the outer table as
the outer member when join processing; a plan that would specify otherwise would result in
an error message being generated, and the query would not be compiled.
Specifying Join Types

The 'g' in the g_join operator stands for 'generic,' which means the optimizer is free to choose
either nested loop or merge joins. To specify a particular strategy, use nl_g_join to specify
nested loop joins and m_g_join for merge joins. When the server creates an abstract plan, it
picks the specific algorithm operator (m_g_join or nl_g_join), not g_join. Existence joins can
use either the nl_g_join or g_join operator.

A query joining two tables and searching on one column could use one of three different
strategies in its abstract plan:

• Nested-loop join
• Merge join with specification of using join column indexes
• Merge join creating a work table by selecting qualifying rows based on the SEARCH
clause

Given a query like this:

select * from tab1, tab2


where col11 = col21 and
tab1col3 = 5

Here is how the three abstract plans would look:

Nested loop:

(nl_g_join
(i_scan indx_tab1_col3 tab1)
(i_scan indx_tab2_col1 tab2)
)

For this plan, the first index, indx_tab1_col3, is used by the SEARCH clause; the second
index, indx_tab2_col1, is used for joining to tab2.

A merge join would look like this:

(m_g_join
(i_scan indx_tab_col12 tab1)
(i_scan indx_tab_col22 tab2)
)

This merge join is using the indexes on the join keys, and no interim table is needed.

Alternately, a merge join could select qualifying rows out of the table that has the SEARCH
clause on it, place those rows into a derived table, and join that to the second table. Its plan
would look like this:

(m_g_join
(i_scan indx_tab1_col3 tab1)
(i_scan indx_tab_col22 tab2)
)
Hints and Partial Plans

Sometimes, a full plan is not the best solution. Maybe after running a query you determine
that the only thing you need to do is change the optimizer's decision to pick a table scan rather
than an index. You can do this with a partial plan. To do this, you only need to write that part
of the plan and leave the rest up to the optimizer.

For instance, if you wanted your second table to use an index, you would write:

(i_scan indx_tab2_col2 tab2)

where the second argument is the name of the index and the last is the table name (tab2).

If you had several suggestions for the optimizer, enclose them in parentheses with the word
'hints' at the beginning of the phrase, like the following:

(hints
(i_scan indx_tab2_col2 tab2
i_scan indx_tab3_col1 tab3
t_scan tab4)

Here you are specifying that your second and third tables use an index and your fourth
performs a table scan. No comment is made about your first table, so the optimizer is free to
choose.

Specifying Illegal or Inconsistent Plans

If you specify something that the server isn't allowed to do, your plan won't compile and you
generate an error. For example, if you tell the optimizer to join table1 to table2, and then tell it
to join 2 to 1, you are being inconsistent. The optimizer will not compile your plan and will
generate an error.

Other inconsistent hints may not generate an error. For instance, if you hint to the optimizer to
do a table scan and then hint to use an index, the optimizer will assume it is free to use either.

Creating Abstract Plans for Subqueries

Subqueries are resolved by one of three techniques: materialization, flattening, or nesting.

• Materialization is the execution of the subquery, and the results are stored in an
internal table or variable. This is often used to resolve subqueries that compute
aggregates, (ie., sum(), avg(), etc.).
• Flattening changes the subquery into a join. This can improve performance for many
queries.
• Nesting is the running of the query once for every row in the outer query.

You cannot use abstract plans to change the technique by which the subquery is resolved. The
optimizer uses rules to determine this. However, abstract plans can be used to change the way
an inner or outer query is resolved, and in the case of nested query plans, abstract plans can be
used to choose where the query can be nested in the outer query.
As a side note, bear in mind that Sybase has approached the resolution of subqueries and the
resolution of the outer query differently in various releases of the server. Check the 'What's
New' notes on each release of the server to be sure you understand how it is processing
subqueries.

Subqueries are numbered within a query starting with 1. You can change the ordering of
where a subquery is executed within a query, but you have to make sure that the columns the
query is dependent upon are available.

Materialization

Given a query:

select * from tab1


where col1 = (select ave(col5) from tab2)

The plan would materialize (i.e., resolve) the aggregate in the subquery; the second part
would use the results to do a scan of table1. Take a look at the abstract plan below.

The first step is to materialize the scalar aggregate that is in the subquery:

(plan
(i_scan i_col1 (table tab2 (in (subq 1))))
(i_scan i_co11 t1)
)

The second step is to use the result and scan the table tab1.

Flattening Queries

Whenever the optimizer can, it will flatten a subquery into a join. This allows the optimizer to
pick the order in which it wishes to access the tables.

Furthermore, the optimizer will make the join an existence join when possible, thus allowing
the server to process the query in either order and return from each row once the first match
was found. With comparison operators that are working on columns with unique indexes, the
optimizer will convert the expression into an equi-join.

The g_join and nl_join operators let the optimizer detect an existence join.

Joins often produce duplicate rows. A query must either eliminate these duplicates or be sure
it won't produce any. There are three strategies to do this:

• Use a unique index and a normal join


• Reformatting
• Duplicate elimination (by pulling qualifying rows into a work table, sorting, and
eliminating dups)

Abstract plans can be used to specify the index to be used in the first strategy, but they have
no effect on the other two strategies.
Changing Join Order of Flatten Queries

Given the query:

select *
from tab1, tab2
where tab1.col1 = tab2.col1
and tab1.col3 < 100
and exists (select * from tab3 where tab3.col3 != tab1.col3)

The order in which the tables are placed in an abstract plan determines the order in which they
are joined. For the above query, table3 can be forced to be scanned last, as follows:

(g_join
(scan table1)
(scan table2)
(scan (table table3 (in subq 1)))
)

To change the order, table3 is scanned second:

(g_join
(scan table1)
(scan (table table3 (in subq 1)))
(scan table2)
)

Nested Subqueries

The nested operator positions the subquery within the larger query.

(g_join
(nested
(i_scan indx_t1_c1 table1)
(subq 1
(t_scan (table table3 (in (subq 1))))
)
(i_scan indx_t2_c2 table2)
)

This would nest the subquery over table1.

Abstract plans and aggregates:

select min(col2) from table1

(plan
(t_scan table1)
()
)

In the above plan, the first step computes the scalar value and stores the result in an internal
variable, and the second step (the empty parentheses) returns the variable. The parentheses are
empty because there is nothing to optimize.
With vector aggregates, there is a two-step process. Since the vector aggregate will likely
return more than one value, the first step calculates the values and places them into a
temporary table; the second step scans the table.

With nested aggregates, there is a third step, placing the value into a variable and returning its
value. Nested aggregates are a T-SQL extension.

Vector aggregates and nested aggregates require the creation of a work table as a first step. In
the case of vector aggregates, the next step is to scan the work table. In the case of nested
aggregates (a Sybase extension), the scalar aggregate is scanned into an internal variable, and
the value is finally returned in the third step.

Extended columns, another T-SQL extension, have additional processing too.

For the following query, the vector aggregate must be calculated and then joined back into the
base table:

select min(price), price


from t1

Reformatting

Sometimes, when a table is large and has no useful index for a join, the optimizer considers
creating a sorted work table with a clustered index and using a nested-loop join. This strategy
is known as reformatting. Adaptive Server creates a temporary clustered index on the join
column for the inner table. In some cases, creating and using the clustered index on the work
table is actually cheaper than a sort-merge join.

In this query, t2 is very large and has no index:

select *
from t1, t2
where c11 > 0
and c12 = c21
and c22 = 0

The abstract plan that specifies the reformatting strategy on t2 is:

(g_join
(t_scan t1
(scan
(store Worktab1
(t_scan t2)
)
)
)

In the case of the reformatting strategy, the store operator is an operand of scan. This is the
only case when the store operator is not the operand of a plan operator.
OR Strategy

Adaptive Server uses a RID (row ID) scan to resolve queries with the OR strategy. The query
plan for an OR strategy query is simply described with a scan operator. There is no way to
affect the OR strategy through the use of abstract query plans.

The OR strategy is not specifically described within abstract plans. However, it will not be
used for tables described via the t_scan or the i_scan operators. When the scan operator is
used, however, the optimizer may choose the OR strategy.

Not Specifying a Store Operator

Some multiple-step queries that require work tables do not require multiple-step plans with a
separate work table step and the use of the store operator to create the work table. These are:

• The sort step of queries using the distinct function


• Work tables needed for merge joins
• Work tables needed for union queries
• The sort step when a flattened subquery requires sort to remove duplicates

General Strategy to Write a Plan

The simplest method to write a query plan is to adapt the plan from an existing one. Use the
existing plan for the query or look at a plan from a query that does something similar to what
you want to do and capture it.

Capture the Existing Plan

Use sp_help_qplan to look at it.

Syntax:

sp_help_qplan id [, mode ]

• id - The ID of the abstract plan.


• mode - The type of report to print, which is one of the following:
o full - Returns the plan ID, group ID, hash key, the full query, and plan text.
o brief - Returns almost the same data as full but truncates the query and plan to
80 characters. This is the default mode.
o list - Returns the number or rows and number of abstract plans in the group.
Then for each query/plan pair, returns the hash key, plan ID, the first few
characters in the query, and the first few characters in the plan.
o queries - Returns the number or rows and number of abstract plans in the
group. Then for each plan, returns a hash key, plan ID, and first few characters
of the query.
o plans - Returns the number or rows and abstract plans in the group. For each
plan, returns a hash key, plan ID, and the first few characters of the plan.
o counts - Returns the number of rows and abstract query plans in the group. For
each plan, returns the number of rows, number of characters, hash key, plan
ID, and the first few characters of the query.
Create a plan by adding create plan to the text or edit the SQL with a PLAN clause.

Partial plans should be used to reduce the choices of the optimizer (for example, providing a
clause telling it what index to use for a specific table). However, the data and its distribution
may change. The usefulness of the stored plans may become questionable. The abstract query
plans are static and don't change, even if your data distribution does. update statistics will
have no real effect on a query plan, since the optimizer is essentially bypassed.

Comparing 'Before' and 'After' System Upgrades

You can assess the impact a software upgrade will have on your system by comparing the
query plans before and after the upgrade. Essentially, you have to save the plans before you
upgrade and place them in one group, upgrade and save the new plans in another group, and
use the diff mode operator of sp_cmp_all_qpland to compare the plans.

Here's how it is done step by step:

1. Set abstract plan dump to 1. This enables server-wide capture mode and sets the
default group to sp_stdout.
2. Let the system run to capture all the plans, or most of them. Once the number of rows
in sysqeryplans in the ap_stdout group has stopped growing, you've got most of them.
3. Use sp_copy_all_qplans to copy from ap_stdout to another group. Copying to ap_stdin
will use server-wide plan load mode.
4. Use sp_drop_all_qplans to get rid of all query plans in ap_stdout.
5. Make upgrade or tuning changes.
6. Let the server run, and capture query plans again.
7. Use sp_cmp_all_qplans with the diff mode parameter:

sp_cmp_all_qplans sp_stdout, sp_stdin, diff

When capturing the plans, make sure that there is enough time allocated to capture as many
plans as possible. This time can vary depending on the activity on the system. The copying of
plans also requires space, so make sure there is enough space available on the system segment
and in the transaction log for storage of the plans.

Chapter 20: Miscellaneous Topics


Overview

This chapter will cover a few topics that don't seem robust enough to require an entire
chapter, yet they need to be addressed in a performance and tuning book. These topics
include:

• bcp
• dbcc
• tempdb performance issues
• Solid state devices
• Logs
bcp

You've used bcp for things like initial file loads and to make copies of system tables, but it
can seem interminably slow, especially when you have gigabytes, hundreds of gigabytes, or
terabytes to load. Want to speed it up?

bcp Types

When there are no indexes or triggers on a table, bcp does not log the data rows actually being
inserted. Instead, it only logs the data extents and pages actually allocated during the course of
the bcp operation. This is commonly referred to as fast bcp. If, however, there are indexes or
active triggers on a table, the server will log every individual row being inserted. This is
referred to, and rightfully so, as slow bcp and will always run slower than a comparable fast
bcp process. It is possible, however, to place triggers on your table and still run fast bcp.
Starting in version 12, you have the ability to temporarily disable triggers:

alter table [database_name.[owner_name].]table_name


{enable | disable} trigger [trigger_name]

Just disable the trigger before the start of your bcp and re-enable it once completed. This
saves you from having to drop your triggers before any large bcp's. Remember, though, that
by doing this you could be creating potential data or business rule integrity issues. Make sure
that disabling the triggers is not going to cause problems later on.

Using bcp

Fast bcp will only work in a database when the select into/bulk copy option is activated.

sp_dboption dbname, "select into/bulk copy", true


go
use dbname
go
checkpoint
go

Slow bcp will work either way.

How to Improve bcp Performance

With every tip and suggestion, remember that performance can always vary across platforms
and server instances, so always benchmark first! Given that caveat, here are some suggestions
for improving performance:

• Creating a 16K buffer pool can increase performance significantly. You don't need a
huge 16K pool; you simply need to have one in the cache that the target table uses.
• Test the effect of using multiple simultaneous engines during load into the same table,
even if you're not partitioned (parallel bcp will be covered shortly).
• If you must load sequentially into an indexed (usually clustered) table, presort the data
in Adaptive Server sort order (which is binary by default). This can give you a
performance benefit of up to ten times.
• Parallel bulk copy allows for multiple bulk copy processes to be performed against the
same table simultaneously, with proportional increases in performance.

In order to take advantage of parallel bcp, the table must be partitioned.

Partial syntax for parallel bcp:

bcp db.owner.table:partition# in...

In this syntax, the partition number is the number of the partition that the bcp process is
intending to fill. If you have the hardware to partition sufficiently, parallel bcp is probably the
fastest way to get your data loaded.

dbcc Locking Considerations

With the advent of dbcc checkstorage, you may not be using checkdb or checkalloc. If you
still are, however, here are some tips for maximizing bcp performance.

In general, the best way to run dbcc is when there is little or no other activity on the server,
and, if possible, as a massively parallel process. If you can't run it by itself, be sure you're not
running it in a way that saturates the server's CPU and I/O capacity, starving all other server
processes.

Command/Option Scope of Check Locking Performance/ Coverage


Speed
checktable page chains, sort Shared table lock, slow Full coverage
checkdb order, data row one at a time
checks for all (lock A, release
indexes A, lock B, release
B).
checktable page chains, sort same Potentially much Partial coverage,
checkdb with order, data rows faster than non-
skip_ncindex for tables and without clustered indexes
clustered indexes the skip_ncindex ignored
option, dependent
on # of non-
clustered indexes
checkalloc page chains Table locks. slow Full coverage
Heavy
I/O. Only
allocation pages
cached.
tablealloc full page chains Shared table lock. slow Full coverage,
indexalloc full Heavy I/O. Only essentially
allocation pages distributed
cached. checkalloc
tablealloc fast OAM pages shared table lock fast Least coverage -
indexalloc fast only OAM pages
checked.
checkcatalog system table rows Shared page locks fast Detailed check of
on system tables, consistency of
released after data rows of certain
on page is system tables.
checked. Not
much cached.
checkstorage OAM pages, page Fast Detailed
chains, data rows information in a
database
Command/Option Scope of Check Coverage
dbcc checkdb Checks all tables for a database Run before a database dump (or if
inconsistencies are suspected).
dbcc checktable Checks a specific table's Run if a certain table is suspect or
consistency as part of a maintenance plan
(check 1/2 tables on Sun, 1/2 on
Wed).
dbcc checkcatalog Checks rows in system tables for Run before a database dump (or if
consistency inconsistencies are suspected).
dbcc checkalloc Checks allocation for all tables and Run before a database dump (or if
indexes in a database inconsistencies are suspected) with
fix option.
dbcc tablealloc Checks allocation for a specific Run with fix option when a table is
table identified by checkalloc or as part
(or clustered index) of a maintenance plan (check 1/2
tables on Sun, 1/2 on Wed).
dbcc indexalloc Checks index page pointers Run with fix option when an index
is identified by checkalloc or as
part of a maintenance plan (check
1/2 indexes on Sun, 1/2 on Wed).
dbcc fix_al Fixes allocation pages reported by Only useful to pre-system 10
checkalloc servers; basically does a
checkalloc with a 'fix' option in
System 10.
dbcc checkstorage Checks allocation and consistency Runs really fast; does not lock
for tables for extend periods of time
the data pages, storing output in a
database for later reporting

tempdb Performance Issues

tempdb can be a critical bottleneck for the entire Adaptive Server. It is used by all users
explicitly creating temporary tables and by the server for internal work tables (sorts,
reformatting, etc.).
A significant amount of locking can occur in tempdb. This is counterintuitive because you
would think that the tables have nothing to do with one another. That is, until you start
thinking about the system tables which track creation and deletion of the tables, rows,
columns, etc.

tempdb Performance Tips

Keep temporary tables small. Select only required columns rather than 'select *.' One of the
most frequent mistakes I've seen in tempdb is the creation of temp tables with more rows than
is necessary. Only select rows into temp tables that are absolutely necessary.

You can place indexes on temp tables. If the temp table is of sufficient size and is going to be
accessed multiple times, it may be cost effective to create an index on it.

Avoid unnecessary work tables. Create a clustered index on frequent 'order by' column(s).

Consider performing sorts in the application rather than the server if you're sorting only a few
rows and doing it often.

Bind the data and log segments (you do have separate data and log segments, don't you?) to
their own named caches. This will prevent your default data cache from being impacted by
heavy tempdb activity.

Consider Placing tempdb on a Faster I/O Device

There are a variety of hardware solutions to the I/O bottleneck that can occur in tempdb:

• File system device


• Solid state device
• RAM drive
• DB accelerators

Faster devices can also help reduce the lock contention.

File System Devices

Some shops have reported significant performance improvements (e.g., 20:1) using a file
system device for tempdb. With the advent of 12.0, you may already be using file system I/O
(along with the data files). Due to platform differences, you should test on your own system to
confirm expected performance gains.

Solid State Devices

A number of shops have reported significant performance gains by placing tempdb on a solid
state device (SSD).

Basically, it is static RAM configured to look like a standard file system I/O device. I've never
seen SSD not resolve an I/O bottleneck in tempdb, but the cost is not for everyone.

tempdb on RAM Drive


If supported by the OS, consider creating a RAM drive and placing tempdb on it. Either
approach essentially 'robs' memory that could be used as data cache and dedicates it for
tempdb use only.

Advantages of Assigning Tempdb to Its Own Data Cache

If you have heavy tempdb utilization, as well as high contention in the default data cache,
consider placing tempdb in its own data cache. This reduces heavy use of the data cache when
temporary tables are created, populated, and then dropped. It also keeps the activity on
temporary objects from flushing other objects out of the default data cache.

Important Tip

If you have chosen a faster device for tempdb and are not getting the performance benefit out
of it, perhaps you are still using the first few meg on the master device for tempdb.

There are three ways to do it.

First, you can simply drop all three segments from that fragment of tempdb by directly
modifying the sysusages table and changing the segmap to 0 for that fragment:

sp_configure "allow updates",1


update sysusages set segmap = 0 where dbid=2 and vstart = 0
sp_configure "allow updates",0

This is the recommended approach!

Secondly, you can completely drop this fragment. (Use delete instead of update above, and
then recycle the server.) This is not recommended, as recovery becomes complex.

Finally, you can create the database elsewhere, and then move it to tempdb's position.

Sample Session Moving tempdb

1. Create a new database where you want tempdb to reside.

[create database newtemp on newdevice=200]

2. Modify sysusages and sysdatabases to remove old tempdb rows.


3. 1> sp_ configure 'allow updates', 1
4. 2> go
5. 1> reconfigure with override
6. 2> go
7. 1> begin tran
8. 2> go
9. 1> delete sysusages from sysusages u, sysdevices d
10. where vstart
11. 2> between low and high and dbid = 2
12. 3> go
13. (4 rows affected)
14. 1> delete sysdatabases where dbid = 2
15. 2> go
16. (1 row affected)
Modify sysusages and sysdatabases to change new database references to tempdb
(dbid=2).

1> select * from sysdatabases


where name = 'tempdb'
2> go
name dbid suid mode status version logptr
crdate dumptrdate
------------------------------
tempdb 2 1 0 0 1 264
Dec 8 1994 10:36AM Dec 8 1994 10:36AM
(1 row affected)
go

1> select from sysusages where dbid = 2


2> go
dbid segmap lstart size vstart
------------------------------
2 7 0 5120 16793600
(1 row affected)
1> commit tran
2> go
1> sp_configure 'allow updates', 0
2> go

Locking Contention in tempdb

Locking contention can occur in tempdb if a large number of concurrent users are creating or
dropping temp tables or their indexes. You can minimize contention by minimizing size of
tables in tempdb, using permanent temp tables, or increasing the I/O speed of tempdb.
Another, more radical approach involves the creation of pseudo-tempdbs. These are work
databases that are used like tempdb. One of the major issues with this approach is that you
have to monitor these databases so they do not run out of space, as tables created in them will
not be automatically dropped by Adaptive Server.

Log Bottlenecks

Log bottlenecks have not been as common since Sybase introduced the concept of user log
cache (ULC) and made it tunable.

There are a few tried-and-true solutions to log bottlenecks. You can get faster I/O devices, and
you can split your databases up into multiple databases (multiple databases = multiple logs, if
you're set up properly).

There's also a last-ditch trick that you can try. Adaptive Server does not physically write to
the log until the final commit is issued (@@trancount becomes zero). Therefore, you can
reduce physical writes if you can reduce commits. Reduce commits by building transactions
around SQL. This might potentially increase locking contention, as locks will be held longer.

Instead of:

insert order values (@cust_num, @order_num)


insert order_detail values
(@order_num, @qty, @unit_price)
update commission set total_sales = total_sales +
@qty*@unit_price
where salesperson_id = suser_name()
select "New total sales",total_sales from commission
where salesperson_id = suser_name()

Try:

begin tran
insert order values (@cust_num, @order_num)
insert order_detail values
(@order_num, @qty, @unit_price)
update commission
set total_sales = total_sales +
@qty*@unit_price
where salesperson_id = suser_name()
commit tran
select "Total sales",total_sales from commission
where salesperson_id = suser_name()
Note The select was left out of the transaction.

Be aware that this technique, while minimizing logging, increases the chances for lock
contention.

Summary

There's always another way, but test it thoroughly before migrating it into production.

You might also like