You are on page 1of 8

SQL Server Performance Issue

Quick Reference
Initial data to collect
 Windows Event Log – Application and System
o Any unusual error messages, for example, degraded components or errors/warnings
that coincide with the issue?
 SQL Server Error Log
o Any unusual errors or warnings?
 sys.dm_os_waiting_tasks
o Who is currently waiting? What are they waiting on?
 sys.dm_exec_connections, sys.dm_exec_requests, sys.dm_exec_sessions
o Who is currently running? What are they running?
 sys.dm_os_wait_stats
o What is the wait percentage since the last manual clearing of stats or last restart?
 Ask if this is a virtual machine or physical
o If it is virtual, you need to understand the following:
o Resources allocated to the guest?
  vCPUs
 Memory
o Provisioning at the host level?
 # of guests at the host level
 Resources provisioned to those guests
o Any restrictions / limits by CPU and Memory?
o Access to vCenter stats?
 Since % Processor time does not represent physical hardware resource
consumption, we’ll want to see VM Processor\% Processor Time
o General methods for monitoring VMware include esxtop (host level for admins), Virtual
Center and in-guest performance counters
 I’m going to assume that DBAs won’t have access to esxtop, so I would
recommend requesting “Read Only” access for Virtual Center to allow the ability
to view the state of a VM.
 Within Virtual Center we’ll want to see “CPU ready” summation values (and
check out the following KB regarding conversions
http://kb.vmware.com/selfservice/microsites/search.do?
language=en_US&cmd=displayKC&externalId=2002181)
 For VMware vSphere ESX 4 and higher, the guest counters should be built-in
with no additional permissions needed.  Noteworthy counters include:
 % Processor Time (at host level)
 Host Processor Speed (MHz)
 Limit (MHz)
 Reservation (MHz)
 Memory Limit (MB)
 Memory Reservation (MB)
 Memory Ballooned (MB) – which we’ll never want to see non-zero
values for
 Memory Swapped (MB) – another one that should always be 0
o VMware host power settings (should be “high performance”)
 Look at running traces (sys.traces) or XE sessions
o Anything running that is non-standard? Be sure to look out for “observer overhead”
 Perfmon stats (most recent counter logs)
o Use these numbers to look for non-standard values or values that skew from the
average baseline
 Top resource consumers from sys.dm_exec_query_stats
o Who are the top consumers of I/O and CPU?
o Who are the top consumers if you group by query_hash? (2008 and up)
 SQL Server instance settings from sys.configurations
o Any non-default or non-standard settings?
o Any changes since the last check?
 Database settings from sys.databases
o Any non-default or non-standard settings?
 “Auto” options misconfigured?
 Shrink enabled?
 Uncommon non-defaults values?
 High Virtual Log File counts?
 File placement (any files on the C: drive)?
 Default tempdb single-file?
 Multiple transaction log files (no good long term reason for multiple transaction
log files)
 Non-current database compatibility level
 Auto-growth by percent instead of by size
 Non checksum page verification
o Any changes since the last check?
 Is Resource Governor configured or any of the RG defaults modified?
 What are the OS/BIOS power settings? (should be “high performance”)
Recommended reading
This reference is intended to summarize a very broad subject. Therefore, at minimum, it is
recommended you familiarize yourself with the following freely downloadable references:

Performance Tuning Using Waits and Queues

Troubleshooting Performance Problems in SQL Server 2008

Patterns
The following table is adapted, modified and expanded from the original source SQL Server: Common
Performance Issue Patterns (Pluralsight course) and : “Performance Guidance for SQL Server in Windows
Azure Virtual Machines” (SQL CAT Whitepaper):

Issue Symptoms Next steps


High CPU High CPU % across available CPU counters Confirm that SQL Server is consuming CPU and not
another process
Process counter shows that it is SQL Server
user-time On a shared environment, confirm which SQL Server
instance is consuming the most CPU
Can be coupled with
SOS_SCHEDULER_YIELD / signal wait time Confirm user vs. kernel time (kernel time has a different
troubleshooting path)

How many schedulers are visible and online (check


sys.dm_os_schedulers)?

Any CPU-related configurations modified in


sys.configuations?

If user-time, identify top consuming queries and tune.


Looking for:
 Executing queries associated with high worker
time
 High compilation / recompilation
 “Observer overhead”
 Uninvited parallelism (for example OLTP-
centric workloads running in parallel that
shouldn’t be)

Split out workload (for example, move a database off


the SQL Server instance)
External CPU High “Process: % User Time” (user mode) Which application is associated with the high CPU?
Pressure or “Process: % Privileged Time” (kernel (Perfmon)
mode) – not associated with the SQL
Server instance Should it be driving this much CPU? (Baseline data)

Consider isolation of that application away from the SQL


Server instance

Don’t waste time troubleshooting CPU-driving SQL


Server workloads if the issue is external!
High Privileged High “Process: % Privileged Time” (kernel Filter-drivers injecting themselves into the Windows
Time mode) associated with SQL Server itself driver stack (anti-virus, encryption services, file-system
defraggers)

Missing firmware updates or drivers

Defective or significantly insufficient I/O components


I/O capacity or Slow performance at peak business periods Check Page Life Expectancy counter for low values – less
Latency issues or during maintenance periods than (DataCacheSizeInGB/4GB *300). This can indicate
memory pressure on the system is causing increased
High or quickly increasing percentage of disk IO.
PAGEIOLATCH_XX, IO_COMPLETION,
WRITELOG, ASYNC_IO_COMPLETION waits Identify which database and log files have I/O bottleneck
- prioritize tuning based on sys.dm_io_virtual_file_stats
Avg. Disk sec/Read or Write with excessive
latency Check previous benchmark / baseline operations on the
disk subsystem and validate if I/O capacity has been
High latency measured in reached or exceeded
sys.dm_io_virtual_file_stats
When were drivers last updated? When was the
I/O stall warnings in the SQL Server error firmware last updated?
log
Get everyone “on the path” involved - SAN / Windows /
Degradation warnings in the Windows DBA
event logs
Get specifics – examples including RAID level, spindles,
partition offset, HBA queue depth

The issue could reflect a memory pressure issue as well

Look for ways to decrease I/O demands (indexing, query


refactoring, scaling out)

Use index DMVs within the triaged databases to


prioritize

Evaluate top I/O generating plans via


sys.dm_exec_query_stats and sys.dm_exec_query _plan

Evaluate execution plans / indexing


Fine tune index maintenance (for example – index
rebuilds where most impactful)

Assuming workloads are optimized and appropriate


indexes exists, consider improvements to the disk
subsystem

Consider enabling row or page compression to reduce


the number of I/Os
Memory resource Low Memory: Available Bytes Check max server memory setting for SQL Server – is it
pressure appropriate?
Low SQL Server: Buffer Manager\Page Life
Expectancy Check which component of SQL Server utilizes memory
(such as, CLR, high memory grants for application
Sudden decrease in performance across all queries, and so on) and tune appropriately
queries or issues with specific queries that
have high memory demands Missing indexes, non-sargable predicates?

Signs of significant I/O (shared symptoms) Any query hints being used?

Pattern may involve specific errors in Any high cost sort operations? And if so, can the sorts
conjunction with degradation be avoided?

 Error 701 “There is insufficient Check for Hash Match operations – are they appropriate
system memory to run this given the estimated vs. actual row counts?
query.”
 Error “A significant part of sql If a 32-bit system, is this virtual memory pressure?
server process memory has been Determine if this is external memory pressure
paged out.”
 Error “Failed Virtual Allocate Heavy non-SQL activities?
Bytes: FAIL_VIRTUAL_RESERVE”
Collocated instances?

Excessive system cache due to file copies?

Faulty drivers?

LPIM not set or set without caps?

Check perfmon counters over event period


 Memory\Available Mbytes
 SQLServer:BufferManager\Page Life
Expectancy
 SQLServer:BufferManager\Lazy Writes/sec
 SQLServer:BufferManager\Page Reads/sec

Check I/O related counters

Check memory-related DMVs

Check ring buffer for memory notifications

Check DBCC MEMORYSTATUS to find large consumers

32-bit system? VAS pressure?


Resource Error 8645 when a query is executed Look at the execution plan and see why the higher query
Semaphore  “A time out occurred while execution memory is being requested
waiting for memory resources to
execute the query. Rerun the Error happens before the query is actually executed –
query” searching for memory

Other (lower cost) queries still running fine Estimated memory request may be distorted due to
concurrently cardinality estimate issues

The failing query may run fine in quiet Look for sort & hash operations
periods, but concurrently seeing issues
Test with and without parallelism (decreasing can
Seeing RESOURCE_SEMAPHORE wait type reduce memory requirements)

Seeing Memory Grants Pending counter at Look for other high memory consuming queries to tune
non-zero values and output from
sys.dm_exec_query_memory_grants Using Resource Governor? Check for misconfigurations

If overall there is memory pressure and the queries


cannot be tuned further, indexing etc., you may have a
scale-up or scale-out situation
Server Hung Unable to connect Attempt connection with DAC

If you can connect with DAC, look at waiting and running


tasks and troubleshoot accordingly

Blocking Slow performance – but not necessarily Long running transaction?


any red flags around CPU, memory, I/O
High transaction isolation level?
High percentage of LCK_% waits
Slow performing lead blocker?
Blocked processes identified via
sys.dm_os_waiting_tasks or another Orphaned transaction?
method
BEGIN TRAN and no COMMIT TRAN?

Check the blocking chains for details on requests and


resources involved (blocked process report)

DTC? (translates everything to serializable)


Lock Escalation Significant reduction in concurrency Identify queries associated with these table-level
exclusive locks
You see shifts from lower-granularity locks
to table-level exclusive locks instead Are the statements filtering the rows appropriately?

You see lock escalation events reporting in Can you break it up into smaller sets of operations?
SQL Trace or via Extended Events
Missing or disused indexes? Bookmark lookups?

Consider (after exhausting tuning opportunities):


 ALTER TABLE… SET LOCK_ESCALATION =
{ AUTO | TABLE | DISABLE }
 Trace flags 1211, 1224
Network / Slow Query or workload runs quickly when Row-by-row application programming?
Fetches validated by developer or DBA – but slowly
from application Unnecessary queries being slowly consumed by the
application?
High ASYNC_NETWORK_IO wait type
associated with workloads Actual network path latency issues?

Query Compilation RESOURCE_SEMAPHORE_QUERY_COMPILE Check SQL Compilations/sec, SQL Re-Compilations/sec


Pressure  This is where baselines come in handy –
High CPU Utilization helping you define what is “normal”

High number of ad hoc queries Compare to Batch Requests/sec over time

Volatile workloads (variable performance) Validate sys.dm_exec_query_stats and query_hash to


validate queries that are “the same” but creating new
plans due to a lack of parameterization
Perfmon counter variances from the past
baselines
Form an argument for parameterization and preparing
queries

Profiler SP:Recompile/SQL:StmtRecompile
EventSubClass can be used to validate reasons behind
recompiles
 Schema, statistics, SET options, temp table
changes, hints
Tempdb allocation Slow performance on workloads Validate number of equally sized tempdb data files
page contention  CSS guidance
Latch waits on tempdb resources like 2:1:1 o <= 8 cores, #files = #cores
(first PFS page) & 2:1:3 (first SGAM page) o >8 cores, #files = 8 (add in 4-file
chunks / monitor for contention)
You have query workloads that heavily rely
on temporary tables Is TF 1118 enabled for SGAM contention?

Query plans may show spools / sorts Can the workloads be further optimized to reduce
SET STATISTICS IO worktables reliance on tempdb?
Tempdb Row Query degradation in association with any Check Tempdb DMVs
Versioning of the following:
 Row versioning isolation levels Is tempdb appropriately sized?
 Online index operations
 Triggers Is tempdb collocated with other high I/O consumers?

Tempdb out-of-space errors Is row versioning helpful to key workloads?

Tempdb I/O Issues Tempdb usage is high and it is collocated Start with troubleshooting the workloads first
along with other databases
Don’t resort to an I/O path solution until you’ve
Other databases also have a high I/O eliminated opportunities for workload optimization
demands
When in doubt, try to isolate tempdb from other I/O
Tempdb usage could be a combination of activity
user objects, row versioning activity and
workspace memory
But make sure that I/O path is sufficient to meet I/O
demands
Tempdb Query Slow performance associated with the Start with troubleshooting the workloads first
Workspace following query operations:  Missing indexes
Overhead  Cardinality estimate issues
Hash joins, aggregates  Unnecessary sorts

Sort operations After addressing workload issues, then look at how to


improve the tempdb I/O path
LOB operations

Spools

Cursors
You are evaluating a set of query execution Key areas to validate
Query Plan Quality plans which are associated with poorly  Execution plan
issues performing queries  Associated statistics, indexing
 Query construction
Performance may be good sometimes and
bad other times (not predictable) Look for constructs that can impact plan quality and
proper cardinality estimates:
Row estimated versus actual is significantly  Table variables, TVFs, MSTVFs, Scalar
skewed functions, modifying in-flight variables, data-
 Example “1,000,000” rows type conversions, wrapping indexed columns
estimated, but actually only “1” in expressions, missing indexes, missing or
row – or vice versa stale statistics

Query performs badly or has high waiting


time due to memory grant needs

You might also like