You are on page 1of 81

Maximiliano Damian Accotto

MVP en SQL Server


http://www.triggerdb.com
http://blog.maxiaccotto.com

Best Practices:

Establish a
baseline

Repeat
(if desired)

Measure
performance

Optimize for
real-world
workloads
Identify
bottlenecks

Monitor/review
performance
regularly
Make one
change at a time

Focus on
specific issues

System/OS

SQL
Server

QueryLevel

Activity Monitor
Windows
Performance Monitor

Database Engine
Tuning Advisor

SQL Profiler / SQL


Trace
Database Engine
Tuning Advisor
Alerts (PerformanceBased)

Dynamic
Management Views
(DMVs)

Query Execution
Plans

Category
Largest single database
Largest table
Biggest total data 1 application
Highest database transactions
per second 1 db (from Perfmon)
Fastest I/O subsystem in
production (SQLIO 64k buffer)
Fastest real time cube
data load for 1TB
Largest cube

Metric
70 TB
20 TB
88 PB
130,000
18 GB/sec
5 sec latency
30 minutes
12 TB

Company Profile

Worlds largest publicly listed online gaming platform


20 million registered customers in more than 25 core markets
>14,000 bets offered simultaneously on more than 90 sports
~ 90 live events with videos every day bwin worlds largest
broadcaster of live sports
>70,000 payment transactions (PCI Level 1 and ISO 27001 certified)
per day

Business Requirements

Failure is not an option


100% transactional consistency, zero data loss
99.998% availability...even after loss of a data center
Performance critical
Must scale to handle every user and and give them a great experience
Protect users privacy and financial information
Provide a secure PCI compliant environment for all customers

SQL Server Environment


100+ SQL Server Instances
120+ TB of data
1,400+ Databases
1,600+ TB storage
450,000+ SQL Statements per second on a single server
500+ Billion database transactions per day
Core component in solutions designated for:
Financial transactions
Gaming environments
Tracking user state throughout the system
Solutions primarily scale-up using commodity hardware

SQL Server Infrastructure

Almost 200 production SQL Server instances

Single High Transaction throughput system provides:

Mission critical to the business in terms of performance and


availability

Project Description
Maintains US Equities and Options trading data

Processing 10s of billions of transactions per day


Average over 1 million business transactions/sec into SQL Server
Peak: 10 million/sec
Require last 7 years of online data
Data is used to comply with government regulations
Requirements for real-time query and analysis
Approximately 500 TB per year, totaling over 2PB of uncompressed data
Largest tables approaching 10TB (page compressed) in size

Early Adopter and upgrade to SQL Server 2014 in-order-to:


Better manage data growth
Improve query performance
Reduce database maintenance time

Data at this scale require breaking things down into manageable units:
Separate data into different logical areas:
A database per subject area (17)
A database per subject area per year (last 7 years)
Table and Index Partitioning:
255 partitions per database
25,000 filegroups
Filegroup to partition alignment for easier management/less impact moving data
Filegroup backups
Taking advantage of compression:
Compression per partition
Backup compression

Hardware

Operating
System

Sql Server

Database
Design

Application

Use Disk Alignment at 1024KB

Use GPT if MBR not large enough


Format partitions at 64KB allocation unit size
One partition per LUN
Only use Dynamic Disks when there is a need to
stripe LUNs using Windows striping (i.e. Analysis
Services workload)
Tools:
Diskpar.exe, DiskPart.exe and DmDiag.exe
Format.exe, fsutil.exe
Disk Manager

Here is a graph of performance improvement from


Microsofts white paper:

Sector Alignment
Basic MBR Partition Example

Commands
Diskpart
Select disk 0
List partition

Sector Alignment
Basic MBR Partition Example

Sample Output

RAID-1 is OK for log files and datafiles but you can do


better
RAID-5 is a BIG NO! for anything except read-only or readmostly datafiles
RAID-10 is your best bet (but most expensive)
NEVER put OLTP log files on RAID-5!
If you can afford it:
Stripe And Mirror Everything (SAME) one HUGE
RAID-10
SSD is even better consider for tempdb and/or log files
If adventurous, use RAW partitions (see BOL)

As much as you can get


and more!
64-bit is great for memory-intensive workloads
If still on 32-bit, use AWE
Are you sharing the box? How much memory
needs to be set aside? Set max/min server
memory as needed.
Observe where all this memory goes:
Data Cache vs. Procedure Cache vs. Lock Manager
vs. Other
Keep an eye for A significant part of sql server
process memory has been paged out error in the
errorlog.

Min/max server memory when needed.


Locked pages:
32-bit when using AWE
x64 Enterprise Edition just grant Lock Pages in
Memory privilege
X64 Standard Edition must have hotfix and enable
TF845 (see KB970070 for details)

Large Pages:
ONLY dedicated 64-bit servers with more than 8GB or
RAM!
Enabled with TF834 see KB920093
Server sloooooooow to start be warned!

CPU is rarely the real bottleneck look for WHY


we are using so much CPU power!
Use affinity mask as needed:
Splitting the CPUs between applications (or SQL
instances)
Moving SQL Server OFF the CPU that serves NIC
IRQs

With a really busy server:


Increase max worker threads (but be careful its not
for free!)
Consider lightweight pooling (be SUPER careful no
SQLCLR and some other features see KB319942
and BOL).

Parallelism is good:
Gives you query results faster
But at a cost of using a lot more CPU resources

MAXDOP setting is your friend:


On server level (sp_configure max degree of
parallelims)
On Resource Governor workload group
On a single query (OPTION (MAXDOP 1))

Often overlooked:
sp_configure cost threshold for parallelism (default 5)

Data file layout matters


Choose your Recovery Model carefully:
Full highest recoverability but lowest performance
Bulk-logged middle ground
Simple no log backups, bulk operations minimally
logged

Always leave ON:


Auto create statistics
Auto update statistics

Always leave OFF:


Auto shrink

Optimizes processing times Rebuild


Uses more CPU cores

ALTER INDEX ALL ON Person.Person


REBUILD WITH (MAXDOP= 4)

MaxDOP

CPU ms

Duration ms

7344

7399

9797

5997

15845

5451

Designing High Performance I/O systems

SQL Servers View of I/O

High rate of allocations to any data files can result in scaling


issues due to contention on allocation structures
Impacts decision for number of data files per file group
Especially a consideration on servers with many CPU cores

PFS/GAM/SGAM are structures within data file which manage


free space
Easily diagnosed by looking for contention on PAGELATCH_UP
Either real time on sys.dm_exec_requests or tracked in
sys.dm_os_wait_stats

Resource description in form of DBID:FILEID:PAGEID


Can be cross referenced with
sys.dm_os_buffer_descriptors to determine type of page

More data files does not necessarily equal better


performance
Determined mainly by 1) hardware capacity & 2) access patterns

Number of data files may impact scalability of heavy write


workloads
Potential for contention on allocation structures (PFS/GAM/SGAM
more on this later)
Mainly a concern for applications with high rate of page allocations
on servers with >= 8 CPU cores

Can be used to maximize # of spindles Data files can be


used to stripe database across more physical spindles

Provides less flexibility with respect to mapping data


files into differing storage configurations
Multiple files can be used as a mechanism to stripe
data across more physical spindles and/or service
processors (applies to many small/mid range arrays)
A single file prevents possible optimizations related to
file placement of certain objects (relatively uncommon)
Allocations heavy workloads (PFS contention) may
incur waits on allocation structures, which are
maintained per file.

The primary filegroup contains all system


objects
These CANNOT be moved to another
filegroup
If using file group based backup, you must
backup PRIMARY as part of regular backups
If not, you cannot restore!
Primary must be restored before other filegroups

Best Practice:
Allocate at least on additional filegroup and set
this to the default.
Do not place objects in Primary

Microsoft Confidential

Virtual LOG Files & Performance

SQL Servers View of I/O

TRACE FLAGs

SQL Servers View of I/O

DBCC TRACEON
Use -1 to turn on trace flag globally

DBCC TRACEOFF
DBCC TRACESTATUS
-T startup flag
Use T# separated by semi-colon (;)

Trace flag 610 controls minimally logged inserts into indexed tables
Allows for high volume data loading
Less information is written to the transaction log
Transaction log file size can be greatly reduced
Introduced in SQL Server 2008
Very fussy
Documented:
Data Loading Performance Guide white paper
http://msdn.microsoft.com/en-us/library/dd425070(v=sql.100).aspx

Trace flag 1224 disables lock escalation based on the number of locks
Memory pressure can still trigger lock escalation
Database engine will escalate row or page locks to table locks
40% of memory available for locking
sp_configure locks
Non-AWE memory
Scope: Global | Session
Documented: BOL

Forces all files to auto-grow at the same


time

Trace flag 1118 directs SQL Server to allocate full


extents to each tempdb objects (instead of mixed
extents)
Less contention on internal structures such as
SGAM pages
Story has improved in subsequent releases of SQL
Server

Local and global temporary tables (and


indexes if created)
User-defined tables and indexes
Table variables
Tables returned in table-valued functions

Note: This list, and the following lists, are not designed to be all inclusive.

Work tables for DBCC CHECKDB and DBCC


CHECKTABLE.
Work tables for hash operations, such as joins and
aggregations.
Work tables for processing static or keyset cursors.
Work tables for processing Service Broker objects.
Work files needed for many GROUP BY, ORDER BY,
UNION, SORT, and SELECT DISTINCT operations.
Work files for sorts that result from creating or rebuilding
indexes (SORT_IN_TEMPDB).

The version store is a collection of pages used to store


row-level versioning of data.
There are two types of version stores:
1. Common Version Store: Examples include:

Triggers.
Snapshot isolation or read-committed snapshot
isolation (uses less TEMPDB than snapshot
isolation).
MARS (when multiple active result sets are
used).
2. Online-Index-Build Version Store:

Used for online index builds or rebuilds. EE


edition only.

TEMPDB is dropped and recreated every time the SQL


Server service is stopped and restarted.
When SQL Server is restarted, TEMPDB inherits many of
the characteristics of model, and creates an MDF file of
8MB and an LDF file of 1MB (default setting).
By default, autogrowth is set to grow by 10% with
unrestricted growth.
Each SQL Server instance may have only one TEMPDB,
although TEMPDB may have multiple physical files.

Many TEMPDB database options cant be changed (e.g.


Database Read-Only, Auto Close, Auto Shrink).
TEMPDB only uses the simple recovery model.
TEMPDB may not be backed up, restored, be mirrored,
have database snapshots made of it, or have many
DBCC commands run against it.
TEMPDB may not be dropped, detached, or attached.

TEMPDB logging works differently from regular logging.


Operations are minimally logged, as redo information is not
included, which reduces TEMPDB transaction log activity.
The log is truncated constantly during the automatic
checkpoint process, and should not grow significantly,
although it can grow with long-running transactions, or if
disk I/O is bottlenecked.
If a TEMPDB log file grows wildly:
Check for long-running transactions (and kill them if necessary).
Check for I/O bottlenecks (and fix them if possible).
Manually running a checkpoint can often temporally reduce a
wildly growing log file if bottle-necked disk I/O is the problem.

Generally, there are three major problems you


run into with TEMPDB:
1.

2.

3.

TEMPDB is experiencing an I/O bottleneck, hurting server


performance.
TEMPDB is experiencing contention on various global allocation
structures (metadata pages) as temporary objects are being created,
populated, and dropped. E.G. Any space-changing operation
acquires a latch on PFS, GAM or SGAM pages to update space
allocation metadata. A large number of such operations can cause
excessive waits while latches are acquired, creating a bottleneck
(hotspot), and hurting performance.
TEMPDB has run out of space.

Ideally, you should be monitoring all these on a


proactive basis to identify potential problems.

Use Performance Monitor to determine how busy the disk is where


your TEMPDB MDF and LDF files are located.
LogicalDisk Object: Avg. Disk Sec/Read: The average time, in
seconds, of a read of data from disk. Numbers below are a general
guide only and may not apply to your hardware configuration.
Less than 10 milliseconds (ms) = very good
Between 10-20 ms = okay
Between 20-50 ms = slow, needs attention
Greater than 50 ms = serious IO bottleneck

LogicalDisk Object: Avg. Disk Sec/Write: The average time, in


seconds, of a write of data to the disk. See above guidelines.
LogicalDisk: %Disk Time: The percentage of elapsed time that the
selected disk drive is busy servicing read or write requests. A general
guideline is that if this value > 50%, there is a potential I/O bottleneck.

Use these performance counters to monitor allocation/deallocation


contention in SQL Server:
Access Methods:Worktables Created/sec: The number of work tables
created per second. Work tables are temporary objects and are used to
store results for query spool, LOB variables, and cursors. This number
should generally be less than 200, but can vary based on your hardware.
Access Methods:Workfiles Created/sec: Number of work files created
per second. Work files are similar to work tables but are created by
hashing operations. Used to store temporary results for hash and hash
aggregates. High values may indicate contention potential. Create a
baseline.
Temp Tables Creation Rate: The number of temporary tables
created/sec. High values may indicate contention potential. Create a
baseline.
Temp Tables For Destruction: The number of temporary tables or
variables waiting to be destroyed by the cleanup system thread. Should
be near zero, although spikes are common.

Minimize the use of TEMPDB


Enhance temporary object reuse
Add more RAM to your server
Locate TEMPDB on its own array

Locate TEMPDB on a fast I/O subsystem


Leave Auto Create Statistics & Auto Update Statistics on
Pre-allocate TEMPDB space everyone needs to do this

Dont shrink TEMPDB if you dont need to


Divide TEMPDB among multiple physical files
Avoid using Transparent Data Encryption (2008)

Generally, if you are building a new SQL Server instance, it


is a good idea to assume that TEMPDB performance will
become a problem, and to take proactive steps to deal with
this possibility.
It is easier to deal with TEMPDB performance issues
before they occur, than after they occur.
The following TEMPDB performance tips may or may not
apply to your particular situation.
It is important to evaluate each recommendation, and
determine which ones best fit your particular SQL Servers
instance. Not a one size fits all approach.

If latches are waiting to be acquired on TEMPDB pages for


various connections, this may indicate allocation page
contention.
Use this code to find out:
SELECT session_id, wait_duration_ms, resource_description
FROM sys.dm_os_waiting_tasks
WHERE wait_type like 'PAGE%LATCH_%' AND resource_description like
'2:%'

Allocation Page
Contention:
2:1:1 = PFS Page
2:1:2 = GAM Page
2:1:3: = SGAM Page

Installation & Configuration Best Practices for Performance

Server Role. Server should be a member server of a Microsoft


Active Directory network, and dedicated only to SQL Server.
Windows File, Print, and Domain Controller services should be
left for other machines.

BIOS. Change Power Management to Maximum Performance.

BIOS. Disable QPI Power Management.

BIOS. Change Power Profile to Maximum Performance.

System Architecture. Use 64-bit architecture server.

BIOS. Change Power Regulator to High Performance Mode.

32-Bit Systems. Include de /PAE parameter inside the boot.ini


file on Windows Server 2003 on servers with more than 4GB
RAM.

SQL Server Edition. Use the DEVELOPER edition on


development and test servers. Use the ENTERPRISE edition on
QA and Production servers.

RAM Modules. Validate with the servers manufacturer lowlatency recommendations on CPU and memory SIMMs
combinations, as well as memory SIMMs location on multiple
memory channels per processor.

RAM per CPU Core. For OLTP systems, use 2GB-4GB RAM
per CPU Core.

RAM per CPU Socket in Fast Track v3 (Data Warehousing).


For 2-CPU Socket use minimum of 96 GB RAM. For 4-CPU
Socket use minimum of 128 GB RAM. For 8-CPU Socket use
minimum of 256 GB RAM.

Processor Scheduling. Be sure that in Computer properties,


Performance Options, the Processor Scheduling parameter is
configured for Background Services.
Network Interface Cards. Have, at least, two network interface
cards connected to two different networks in order to divide
application load from administrative load.

CPU Cache. Use servers with CPUs that has L3 memory


cache.

Whitepapers. Look for Low-Latency best


configurations on server manufacturers websites.

BIOS. Disable CPU Hyper-Threading (or Logical Processor) at


the BIOS level. Use Intels Processor ID utility to verify it.

BIOS. Disable CPU Turbo Mode (or Turbo Boost Optimization).

BIOS. Disable CPU C-States (or C-3, C6, etc.).

BIOS. Disable CPU C1E.

practices

Installation & Configuration Best Practices for Performance

Network Interface Cards. Configure each network interface


adapter for Maximize data throughput for network applications.

Network Interface Cards. For OLAP systems (Data


Warehouses and Cubes), Database Mirroring, Log Shipping,
and Replication evaluate using Jumbo Frames (9-Mbps) on
all devices that interact with each other (switches, routers, and
NICs).

Fast Track v3 (DW) Disks. For Windows operating system


and SQL Server binary files, use a 2-Disk Spindles RAID-1
local disks array.

Disk Volumes. Assign separate virtual disks (ex. SAN LUNs)


for SQL Server data, log, tempdb, backups.

Disk Host Bus Adapter (HBA). Insert the HBA adapter into the
fastest PCI-E slot.

Disk Volumes. Use Solid-State (SSD) disks or 15K disks.

PCIe x4 v2.0 delivers up to 2GB/sec.

Disk Volumes. Use RAID-10 (or RAID-1) arrays when possible.


Use RAID-5 as last option. Never use RAID-0. RAID-5 is
excellent for reading, but not best for writing (specially bad in
random write). On direct-attached systems (DAS), if you need
to balance performance and space between solid-state disks
(SSD) and 15K disks (SAS), one strategy is to have solid-state
disk at RAID-5 and 15k disks at RAID-10.

PCIe x4 v1.0 delivers up 1GB/sec.

RAID Controller. In virtual disks, indicate cache configuration


in Write Policy = Write-Through (instead of Write-Back). The
objective is to acknowledge the operating system the
completion of the transaction when is written to the storage
system instead of the RAID controllers cache. Otherwise, is a
consistency risk if the controllers battery is not working and
energy goes down.

PCIe x1 v2.0 delivers up to 500MB/sec.


PCIe x1 v1.0 delivers up to 250MB/sec.

Disk Host Bus Adapter (HBA). Configure the HBAs Queue


Depth parameter (in Windows Registry) with the value that
reports the best performance on SQLIO tests (x86 and x64
only) or SQLIOSIM (x86, x64, and IA64).

Fast Track v3 (DW) Disks. For data files (*.MDF, *.NDF) use
multiple SAN/DAS storage enclosures that have multiple RAID10 groups each one with at least 4-spindles, but dedicate one
RAID-10 group on each storage enclosure for log files (*.LDF).
In Fast Track v3 tempdb is mixed with user databases.

Disk Volumes. Have each operating system disk partitioned as


one volume only. Dont divide each disk into multiple logical
volumes.

Installation & Configuration Best Practices for Performance

Disk Volumes. Partition each disk volume with Starting Offset


of 1024K (1048576).

Disk Volumes. Do NOT use Windows NTFS File Compression.

Disk Volumes. Format disk volumes using NTFS. Do not use


FAT or FAT32.

Disk Volumes. Use Windows Mount Point Volumes (folders)


instead of drive letters in Failover Clusters.

Disk Volumes. Format each SQL Server disk volume (data,


log, tempdb, backups) with Allocation Unit of 64KB, and do a
quick format if volumes are SAN Logical Units (LUNs).

Disk Volumes. Ratio #1. Be sure that the division result of Disk
Partition Offset (ex. 1024KB) RAID Controller Stripe Unit Size
(ex. 64KB) = equals an integer value. NOTE: This specific ratio
is critical to minimize disk misalignment.

Disk Volumes. Assign a unique disc volume to the MS DTC log file.
Also, before installing a SQL Server Failover Cluster, create a
separate resource dedicated to MS DTC.

Windows Internal Services. Disable any Windows service not


needed for SQL Server.

Windows Page File. Be sure that Windows paging is configure to use


each operating system disk only. Do not include paging file on any of
SQL Server disks.

Antivirus. The antivirus software should be configure to NOT scan


SQL Server database, logs, tempdb, and backup folders (*.mdf, *.ldf,
*.ndf, *.bak) .

SQL Server Engine Startup Flags for Fast Track v3 (Data


Warehousing). Start the SQL Server Engine with the -E and -T1117
startup flags.

SQL Server Service Accounts. Assign a different Active Directory


service account to each SQL Server service installed.

Disk Volumes. Ratio #2. Be sure that the division result of


RAID Controller Stripe Unit Size (ex. 64KB) Disk Partition
Allocation Unit Size (ex. 64KB) = equals an integer value.

Service Account and Windows Special Rights. Assign the SQL


Server service account the following Windows user right policies: 1)
Lock pages in memory, and 2) Perform volume maintenance tasks.

Fast Track v3 (DW) Multi-path I/O (MPIO) to SAN. Install


and Multi-Path I/O (MPIO), configure each disk volume to have
multiple MPIO paths defined with, at least, one Active path, and
consult SAN vendor prescribe documentations.

Address Windows Extensions (AWE). If the SQL Server service


account has the Lock pages in memory Windows user right, then
enable the SQL instance AWE memory option. ( Note: AWE was
removed from SQL Server 2012; use 64-bit! ).

Installation & Configuration Best Practices for Performance

Instance Maximum Server Memory. If exist only one (1) SQL


Database Instance and no other SQL engines, then configure
the instances Maximum Server Memory option with a value of
85% the global physical memory available.
Tempdb Data Files. Be sure that the tempdb database has the
same amount of data files as CPU cores and with the same
size.

Startup Parameter T1118. Evaluate the use of trace flag T1118


as a startup parameter for the RDBMS engine to minimize
allocation contention in tempdb.

Maximum Degree of Parallelism (MAXDOP). For OLTP


systems, configure the instances MAXDOP=1 or higher (up to
8) depending on the number of physical CPU chips. For OLAP
systems, configure MAXDOP=0 (zero).

Maximum Worker Threads. Configure


Maximum Worker Threads = 0 (zero).

Boost SQL Server Priority. Configure the instances Boost


SQL Server Priority=0 (zero).

Database Data and Log Default Locations. Configure the


instance database default locations for data and log files.

Backup Files Default Location. Configure the instance backup


location.

the

instances

Backup Compression. In SQL Server 2008, enable the


instance backup compression option.

Filegroups. Before creating any database object (tables,


indexes, etc.), create a new default filegroup (NOT PRIMARY)
for data.

Data and Log Files Initial Size. Pre-allocate data and log files
sizes. This will helps to minimize disk block fragmentation and
consuming time increasing file size stopping process until it
ends.

Fast Track v3 (DW) Compression. For Fact Tables use


Page Compression. In the other hand, compression for
Dimension tables should be considered on a case-by-case
basis.

Fast Track v3 (DW) Index Defragmentation.


When
defragmenting indexes, use ALTER INDEX [index_name] on
[schema_name].[table_name] REBUILD (WITH MAXDOP = 1,
SORT_IN_TEMPDB = TRUE) to improve performance and
avoid filegroup fragmentation. Do not use the ALTER INDEX
REORGANIZE statement. To defrag indexes specially on FACT
TABLES
from
data
warehouses,
include
DATA_COMPRESSION = PAGE.

Tools. Use the Microsoft SQL Server 2008 R2 Best Practices


Analyzer (BPA) to determine if something was left or not
configured vs. best practices.

Installation & Configuration Best Practices for Performance

Tools. Use Microsoft NT Testing TCP Tool (NTttcp) to


determine networking actual throughput.

Tools. Use Microsoft SQLIO and Microsoft SQLIOSim to stress


test storage and validate communication errors.

Tools. Use CPUID CPUz to determine processor information,


specially at which speed is currently running.

Tools. Use Intel Processor Identification to determine processor


information, specially if Hyperthreading is running.

Object

Counter

Value

Notes

Paging

$Usage

<70%

Amount of page file currently in use

Processor

% Processor
Time

<= 80%

The higher it is, the more likely users


are delayed.

Processor

% Privilege
Time

<30% of
%
Processo
r Time

Amount of time spent executing kernel


commands like SQL Server IO
requests.

Process(sqlservr)
Process(msmdsrv
)

% Processor
Time

< 80%

Percentage of elapsed time spent on


SQL Server and Analysis Server
process threads.

System

Processor
Queue Length

<4

< 12 per CPU is good/fair, < 8 is better,


< 4 is best

Logical Disk Counter

Storage Guys term

Description

Disk Reads / Second


Disk Writes / Second

IOPS

Measures the Number of I/Os per second


Discuss with vendor sizing of spindles of different
type and rotational speeds
Impacted by disk head movement (i.e. short stroking
the disk will provide more I/O per second capacity)

Average Disk sec / read


Average Disk sec / write

Latency

Measures disk latency. Numbers will vary, optimal


values for averages over time:
1 - 5 ms for Log (Ideally 1ms or better)
5 - 20 ms for Data (OLTP) (Ideally 10ms or
better)
<=25-30 ms for Data (DSS)

Average Disk Bytes / Read


Average Disk Bytes / Write

Block Size

Measures the size of I/Os being issued. Larger I/O


tend to have higher latency (example:
BACKUP/RESTORE)

Avg. / Current Disk Queue


Length

Outstanding or
waiting IOPS

Should not be used to diagnose good/bad


performance. Provides insight into the applications
I/O pattern.

Disk Read Bytes/sec


Disk Write Bytes/sec

Throughput or
Aggregate Throughput

Measure of total disk throughput. Ideally larger


block scans should be able to heavily utilize
connection bandwidth.

Object

Counter

Value

Notes

Physical Disk

Avg Disk
Reads/sec

<8

> 20 is poor, <20 is good/fair, <12 is better, <8


is best

Physical Disk

Avg Disk
Writes/sec

< 8 or <1

Without cache: > 20 poor, <20 fair, <12 better,


<8 best.
With cache > 4 poor, <4 fair, <2 better, <1 best

Memory

Available Mbytes

>100

Amount of physical memory available to run


processes on the machine

SQL Server:
Memory Manager

Memory Grants
Pending

~0

Current number of processes waiting for a


workspace memory grant.

Page Life
Expectancy

>=300

Time, in seconds, that a page stays in the


memory pool without being referenced before it
is flushed

Free List
Stalls/sec

<2

Frequency that requests for db buffer pages


are suspended because there are no buffers.

SQL Server:
Memory Manager
SQL Server: Buffer
Manager

Object

Counter

Value

Notes

:Access Methods

Forwarded
Records/sec

<10*

Tables with records traversed by a pointer.


Should be < 10 per 100 batch requests/sec.

:Access Methods

Page Splits/sec

<20*

Number of 8k pages that filled and split into two


new pages. Should be <20 per 100 batch
requests/sec.

:Databases

Log Growths/sec;
Percent Log used

< 1 and
<80%,
resp

Dont let transaction log growth happen


randomly!

:SQL Statistics

Batch
Requests/sec

No firm number without benchmarking, but >


1000 is a very busy system.

:SQL Statistics

Compilations/sec
;Recompilations/
sec

Compilations should be <10% of batch


requests/sec; Recompilations should be <10%
of compilations/sec

:Locks

Deadlocks/sec

<1

Nbr of lock requests that caused a deadlock.

DONT RUN SQL Profiler in the server.


Then what?
Run SQL Profiler in your computer.
Connect to the server.
Indicate the events and columns wanted.
Filter by the database to be evaluated.
Run the trace for 1 second, then stop it.
Export the trace as script.
Optimize the script.
And then and only then, run the SQL Trace Script in the server.
And to evaluate?
Use the fn_trace_gettable() function to query the content of the
SQL Trace file(s).
You can use the SQL Trace file(s) with SQL Server Database
Engine Tuning Advisor to evaluate for the creation of new indexes.

General event handling


Goal is to make available well-defined data
in XML format from execution points in code
Baked into SQL Server code
Layers on top of Event Tracing for Windows
Used by
SQL Trace, Performance Monitor and SQL
Server Audit
Windows Event Log or SQL Error Log
As desired by users in admin or development
Introduced in SQL Server 2008

Superset of Extended Events


Can be used in conjunction
with Extended Events
Can be a consumer or
target of Extended
Events
Kernel level facility

Built in set of objects in EXE or DLL (aka Module)


SQL Server has three types of packages
Package0
SQLServer
SQLOS

Packages one or more object types


Event
Actions
Predicates

Targets
Types
Maps

Monitoring point of interest in code of a


module
Event firing implies:
Point of interest in code reached
State information available at time event fired

Events defined statically in package


registration
Versioned schema defines contents
Schema with well-defined data types
Event data always has columns in same order
Targets can pick columns to consume

Targets are event consumers


Targets can

Write to a file

Aggregate event data


Start a task/action that is related to an
event
Process data synchronously or
asynchronously
Either file targets or In-memory targets

File Targets

Event File
ETW File

In-Memory Targets

Ring Buffer
Event Bucketing
Event Pairing
Synchronous Event Counting

Executed on top of events before event info


stored in buffers (which may be later sent to
storage)
Currently used to
Get additional data related to event
TSQL statement
User
TSQL process info

Generate a mini-dump
Defined in ADD/ALTER EVENT clause

Logical expression that gate event to fire


Pred_Compare operator for pair of values
Value Compare Value
Example: Severity < 16
Example: Error_Message = Hello World!

Pred_Source generic data not usually in


event
Package.Pred_Source Compare Value
Example: SQLServer.user_name = Chuck
Example: SQLOs.CPU_ID = 0
Defined in ADD/ALTER EVENT clause

Real-time data capture


No performance penalty
Based on Event Tracing for Windows (ETW)
Full programmability support

Packages
Events and Actions
Filters and Predicates
Sessions
Targets