You are on page 1of 133

1

2
3
4
5
6
7
8
9
10
11
Reduce Data Sprawl
Add Intelligence
Using Graphics to Tune Oracle

Wednesday, June 11, 2008


Kyle Hailey
Kyle.Hailey@oracle.com
*** Overview ***
Target Audience: Production
DBA

A Developer would have a different view


Performance Anxiety
When the manager is asking why is the …

 Response is slow
 Database is hung
 Batch job behind schedule

How do you respond?


Respond Quickly and
Confidently

Like the Doctor:


 Test – collect Data
 Diagnosis - assessment
 Cure – fix problems
A New Approach to Oracle
Tuning
 Old OEM Design Approach
 New OEM Design Approach
 Macintosh Design Approach
 Result of Good Design
 Our Goal
Old Design Method:

 List All Tuning data


 Create screens for all data
 Verify that all data is displayed

=> Overwhelming and confusing


OEMs New Design Method

 Start with Typical Problems


 Identify Solutions
 Design a Visual Representation

=> Put intelligence into the interface


Macintosh Method

 What do people do
 How do they do it
 Design to support it

=> can be used without reading a manual


Without Reading a Manual

 Handspring’s site crashed Nov 25, 1999


 Biggest Sales Day of the Year
 Library cache latch contention
 No DBAs
 Downloaded Quest’s Spotlight
 Installed and Identified problem with
Minutes
 Solution in code fix
Goal
Make Oracle Tuning easy for Anyone
*** Manageability ***

Obstacles and Approaches


What Data to Collect ?
On NT 9iR2
Oracle Tuning Data in 9iR2 NT
 Stats 248
 Waits 361
 Latches 239
 Parameters 257
 Hidden Parameters 540
Total 1645
As well as
 259 V$ tables
 394 X$ tables
And machine data
Statspack Collects the Data

• Standard for Data Collection


 Selects from 44 x$, v$ and dba tables
and views
 1000-1500 lines output
But, How to Analyze the
Statspack Data?
• One page Summary
• Waits are Emphasized to Guide Analysis
• Despite that, Data can be confusing to
Analyze
 Oracle Press book is 600+ pages
 Users still get led astray
 Recent mail:
– IO waits dominate
– one SQL was 99% of the IO
– still concerned about parsing.
Origin of Problem: Tuning
Rules Lists

 Originated from Oracle Version 6


 Calculate ratios to deduce bottlenecks
 Often lead down the wrong path
 Result: tuning inconsequential aspects
Wait Interface – The Guide

 Introduced in Version 7
 Pinpoints performance Bottlenecks
 Used by Statspack
 Needs a clearer presentation than
Statspack
Silver Bullet Myth
 _go_faster=true
 Coping with overwhelming amount of
data
 Oracle Expert type products
 Not All Solutions are Known
Combine Analysis with
Diagnostics

• Known Solutions
• Diagnostics Methods
• Powerful Graphic Interface
• Flexibility to Grow with Changes
*** Graphics ***

the
Advantage
Why Use Graphics
 
You can't imagine how many times I
was told that nobody wanted or
would use graphics … now
everybody uses them.

-- Jef Raskin, the creator of the Macintosh


Macintosh Paradigm

 Introduced Graphical Windows


 Introduced the use of the “mouse”
 Revolutionized the Interface to the
computer
 First computer I used without Reading the
Manual
 Manageable computer
Oracle and Manageability
 Microsoft's perceived ease of use
 Competition Brings Innovation
 Oracle is now tackling manageability
 BMC, Quest, Precise improve the
manageability perception of Oracle
Why Use Graphics in Oracle
Tuning

“The human nervous system …


is exceptionally good at parsing
visual information, especially
when that information is coded
by color and/or motion.”

Knowledge representation in cognitive science. Westbury, C. & Wilensky, U. (1998)


Why Use Graphics in Oracle
Tuning

Increase comprehension of Data by


 Correlating
 Increasing data Density
 Providing known Solutions
 Guiding Investigation
“Parsability of Graphics”

Infocus – (overhead projectors)


sited a study that humans can
parse graphical information
400,000 times faster than
textual data
Trends: What are the Trends?
Trends: With Graphics

I II III
IV
Trends: Waits Over Time
date 17:13 17:13 17:14 17:14 17:15 17:15 17:16 17:16 17:17 17:17
17:18 17:18 17:19 17:19 17:20 17:20 17:21 17:21 17:22
17:22 17:23 17:23 17:24 17:24 17:25 17:25 17:26 17:26
17:27 17:27 17:28 17:28 17:29 17:29 17:30 17:30 17:31
17:31 17:32 17:32 17:33 17:33 17:34 17:34 17:35 17:35
17:36 17:36 17:37 17:37 17:38 17:38 17:39 17:39 17:40
17:40 17:41 17:41 17:42
log_file_switch_completion_3 0 0 0 0 2982936 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 237642 0 0 0 0
0 0 0 0 0 0 0 0 2100419
0 0 0 0
log_buffer_space_3 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 6478221 0
14945538 5219836 0 0
QL*Net_break/reset_to_client_3 8501199 11385539 7044359 8234690 6183237 2333297 7776659 6118198
6495648 7518489 11455326 7702865 11085762 7634300 10582833 13260400 8544422
13172524 8502041 12010958 9208533 9384118 10208957 7399676 6747457 6231611
13441429 7344615 8761884 6642950 7256315 7073623 8442915 7060399 7485662
6601072 6349018 9929440 4902230 6079359 5751021 9455790 9087866 9341209
7664906 16499162 10234535 10588829 8638014 8918795 4312962 8056957 4944408
7195059 3332565 3154348 5530740 4965315 7011341 5248675 6846116 6740378
5817539 9012238 7464656 9055304 8781313 9115954 9858761
latch_free_3 4306495 1931327 3327222 2181023 1764023 3238736 1637877 1922399 1786672
2312507 23083793 1091835 1862666 6949805 6453080 781313 2767384 1164596
2249654 440342 2076414 4062214 2408959 2886800 2177722 3332627 1237003
2098221 5986478 828048 2570183 2104943 2686288 2259440 2388341 2044871
1462208 2306024 1635624 2132461 783102 2677123 4743636 1292606 1499421
2439544 2753717 1905407 3017933 1651176 3686294 1088209 922402 2319624
2645539 1260079 4316997 1311242 2087917
rdbms_ipc_reply_3 27920139 31888722 29909917 27903331 27939736 29891410 29901186 29953756 27895419
29879828 28086221 29906822 27903067 29915873 29896307 31878918 28271932 29883910
29885870 29906113 27933209 27794807 29879997 28842040 30138759 12102954 0
0 0 0 0 0 0 0 0 0
0 4681562 0 0 1018 22981 0 0 0
383 0 0 0 0 0 0 0 0
0 0 0 3079 0 700 0 604 0
0 42240 0 0 0 508 2702
log_file_sync_3 4897094 5984818 8773859 6732708 2528699 21640100 4667088 9349937 6102384
5648418 16238557 9202562 4214425 12911555 8677281 1982443 5549138 2978201
4361914 2924674 3969540 4608851 4499043 5563593 7480457 8736114 4087525
7916623 9531711 4204114 4002465 4230277 4795701 6324440 4740469 5123660
2891044 8931372 4254477 6177218 5119836 4701245 9094181 6751040 6969569
8678671 10627609 8452116 13505152 9052070 9199762 6939591 39153732 18974176
71305938 25005388 6781095 3927165 enqueue_3 39067 87598 58429 93754
38807 270867 64623 76158 125199 39092426 73909741 82618 78157
12192579 30084139 30154100 30426042 30034564 30058775 30062114 27062239 27585615
30040241 23741759 139008 75112 46778 78246 30794 93934 850847
83964 91959 572765 18938 214037 112043 93392 513358 80052
121984 148309 1683718 73709 33330 73894 476811 2072430 34138
40273 88209 512731 460956 492492 2979781 50846 80892 84755
248836
db_file_sequential_read_3 107039051 109566587 127008942 97033615 74615747 70728654 109231682 93755676 86966703
117638046 76065788 88137834 66759488 89709570 80277614 50762622 58207938 61815450
65610580 51599402 74260828 78138965 49867005 21069074 17680159 11460495 12467891
33786092 33211896 40736046 49212320 41162084 45786601 24122201 55651100 34316957
11773008 62455231 48897870 67248119 64077462 88899277 89622175 34483526 57402401
35319735 30619562 61707668 34514340 39504714 37681062 34165368 24952296 30172023
25510144 40260331 83215876 78450352 56963813 33846491 28869234 52964974 38161443
43878778 31095295 26018747 58239638 19793917 7914975
itch_(checkpoint_incomplete)_3 0 0 0 0 285950959 247915390 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
Trends: Waits over Time
Quantity: Map Alone is 20,000
points
Quantity: Extent Mapping
Quantity: Extent Mapping with
Mouse
Quantity: Seeing Patterns:
Extent Anomalies
Quantity: Zoom Out, Over 8000
data pts
Correlating Data
Perspective:Tufte
Perspective:Tufte
Perspective: Oracle
Perspective: Oracle
Clutter
Clear Concise
Tuning Lists are Like Full Table
Scans
 Examples of tuning lists:
Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
Rule 6
Rule 7
Rule 8
Rule 9
Rule 10
Rule 11

Use an Index on the Tuning
Rules
Waits Details
SQL Analyze

Host Info User I/O Disk Analyze


Machine

Oracle

Locks Object Analyze


Waits + CPU

SQL Detail
Buffer Cache Analyze
Top SQL (user) Latches

Session Detail

Log File Sync

Network
Guiding User – Shallow Index
Machine

Oracle
User I/O
Waits + CPU

Buffer Cache Analyze


Top SQL

A GUI can prune the decision


by highlighting and/or
guiding the user (like a
shallow index tree)
Keeping Context - Schema

Machine

Orac
Waits + User I/O
le
CPU SQL Analyze

Buffer Cache Disk Object


Top SQL Analyze Analyze Analyze

A GUI can keep context giving


keeping making the tuning
process seem less like
Labyrinth
Keeping Contexkt - Example
Extending Visual Metaphors

Graphics can increase the density and


clarity
Buffer Cache Advisor Example
Ex Buffer Cache with Graph
Using Visual Metaphors
Larger Contention Area
Providing Interactivity :
Simulations
*** Attempts at Graphics***
First Try : 1994 Perot/Europcar
 Goal – no interaction needed
 Visible across the room
 Used statistics shown in dials and bar
graphs
 Used TCL/TK
Second try at a Graphical
Monitor

TCL/TK front end that communicated over TCP/IP


with daemon attached to the SGA and reading
out Oracle statistics data
Latch – V$latch
Library Cache – V$library Cache
Statistics – v$sysstat
Waits – v$system_event
Rollback Segments – v$rollstat
Row Cache v$rowcache
Data
 Values are Cumulative
 Necessitates snapshots
 Delta’s need to be calculated between
snapshots
 Normalizing Data needed to compare any
interval
Monitor Everything Approach
Wait Events Became the Focus
Improvements
• Scale Graphs Equally
Eliminate:
• Background waits
• Idle Waits
• Extraneous waits
• Wait counts
With Improvements
160K Log Buffer
50M Log Files
2000 Block_buffers
Log Files on Raw
4000 Block Buffers
8000 Block Buffers
Wait vs CPU over tuning tests
test elapse time wait time CPU %idle
1 2240 1328.16 45
2 1555 593.59 29
3 1365 408.15 23
4 1155 176.84 11
5 1055 81.18 6
6 1040 65.97 4
7 985 41.7 3

4000
wait tim e 3500
elaps e tim e 3000
2500
2000
1500
1000
500
0
1 2 3 4 5 6 7

p://oraperf.sourceforge.net/seminar/examples.html
Quest Paradigm: stacking
Waits
GUI: Buffer Busy Waits
GUI: Log File Switch
GUI: Log Buffer Space
GUI: Latch and Log Sync
GUI: All Tests 8.1.7
What do CPU and Wait Mean ?
Service Time
Service Time over 5 sec
Snapshots
Two Users Sharing the CPU
Two Users: Combining the
Service Times
Displaying CPU and Wait Time
Wait Bound System
CPU Bound System
Resulting Graphic Interface
TOP – Triangle of Oracle
Performance

HOST
System
ORACLE
System

Users SQL

SQL Sessions
Methodology

 1) Machine running correctly


– Run Q and Paging
 2) Oracle instance correctly configure
– Compare Waits to CPU
 3) Drill down to SQL and/or Sessions
– Root of all problems
– Possible group by Sessions, Modules,
Actions, Client id, Service or Machine
Machine Statistics
Which Machine Statistics to use?
– CPU

User
system
– Memory

Size
Free
Swap
– IO Wait
– Network
1 Machine Statistics Core
Only need:
• Paging
• Run Queue
Paging and Run Queue can cause latch free
and unreliable CPU statistics in Oracle
and are not directly detectible in Oracle
statistics. Other Machine statistics such
as Network and I/O problems can be seen
directly in Oracle statistics.
2 Oracle System Level
 Compare WAIT time to CPU
 If CPU bound look at CPU consumers
 IF wait bound drilldown to data relevant
to the highest wait type
3 Oracle Drill Down – SQL
 SQL is the source of All problems
 Aggregate SQL usage by Session
 Possibly Aggregate by
– Machine
– Module
– Action
– Client ID
– Service
Putting it Together Graphically

3
1) Host Information
Run Queue
• top sql ( and top user etc)

Paging
• PGA Advisor
• Shared Pool Advisor
• Buffer Cache Advisor

Else improve Hardware


2) Oracle System View
3) Oracle Correlation Data
*** Sampling ***
Snapshot Method has
Limitations

 Performance Page populated by


Snapshoting
 Snapshoting is prohibitively expensive for
the drill down data
Snapshots and Drilldown Data
 Top SQL by CPU
– Expensive
 Top SQL by Wait
– Doesn’t exist
 Top Session by CPU
– Feasible
 Top Session by wait
– Expensive
 Top Object
– Expensive (could be thousands)
Sampling: Solution for
Collecting Data
 Based on V$SESSION_WAIT
 Allows easy Aggregation by
– Session
– SQL
– Machine
– Module
– Action
– Client Id
 Identifies SQL, Users and Objects to track
Sample Data Example
Sampled Data Visually
Sampled Data for Top SQL
*** Developing a Drilldown ***

User I/O Example


Determine I/O Problems
 Problem SQL
 Hot File or Disk
 Hot Objects due to Sparseness or Row
Chaining
 Undersized Buffer Cache
Determine Solutions
 Tune SQL
 Redistribute I/O load or change I/O
subsystem
 Rebuild indexes or Tables
 Increase Buffer Cache size
Design Visual Interface
First, Orient the Viewer , give a context
Generic Drilldown for Waits
 SQL
I/O Specific Drilldowns: File
Response
I/O Specific Drilldowns: Hot
Objects
I/O Specific Drilldowns: Buffer
Cache
*** Story Board ***
IO Wait Problem
IO Look at Wait Groups
IO Click on Problem Group
IO Drill Down: Problem SQL
IO Drill Down: Problem SQL
FlyOvers
IO Drill Down: Buffer Cache
Size
IO Drill Down: Problem Disks
CPU Bound
CPU Bound: SQL?
CPU Bound: Users
CPU Drill Down on User

You might also like