You are on page 1of 24

AWR report analysis in depth-

part 1

1.Let me first create a test


case:I will take different OS
statistics and run time
statistics to compare with
AWR report generated after
execution.
A.About test case in lower CPU:-
I will create a table first
create table t (id number, sometext varchar2(50),my_date date)
tablespace data;

Now I will create a simple procedure to load bulk data


create or replace procedure manyinserts as
v_m number;
begin
for i in 1..10000000 loop
select round(dbms_random.value() * 44444444444) + 1 into v_m
from dual t;
insert /*+ new2 */ into t values (v_m, ‘DOES
THIS’||dbms_random.value(),sysdate);
commit;
end loop;
end;
This insert will be executed in 10 parallel sessions
using dbms_job.This will fictitiously increasded load
on database.
create or replace procedure manysessions as
v_jobno number:=0;
begin
FOR i in 1..10 LOOP
dbms_job.submit(v_jobno,’manyinserts;’, sysdate);
END LOOP;
commit;
end;
/

Now I will execute manysessions which will fork 10


parallel sessions
exec manysessions;

B.About test case in higer CPU:-


I will create a table first
create table t (id number, sometext varchar2(50),my_date date)
tablespace data;

Now I will create a simple procedure to load bulk


data.I will make commit out of loop to decrease
frequent commit.
create or replace procedure manyinserts as
v_m number;
begin
for i in 1..10000000 loop
select round(dbms_random.value() * 44444444444) + 1 into v_m
from dual t;
insert /*+ new2 */ into t values (v_m, ‘DOES
THIS’||dbms_random.value(),sysdate);
–commit;
end loop;
commit;
end;

This insert will be executed in 10 parallel sessions


using dbms_job.This will fictitiously increasded load
on database.
create or replace procedure manysessions as
v_jobno number:=0;
begin
FOR i in 1..10 LOOP
dbms_job.submit(v_jobno,’manyinserts;’, sysdate);
END LOOP;
commit;
end;
/

Now I will execute manysessions which will fork 10


parallel sessions
exec manysessions;

I will execute the test case on


server with low CPU (A) as well as
high CPU (B).

2.I will use oratop tool to


monitor the sessions
oratop – Utility for Near Real-time Monitoring of Databases,
RAC and Single Instance (Doc ID 1500864.1)

./oratop -f -d -i 10 / as sysdba

optionally you press x

Then you can put sql_id to see the execution plan as well.
For A.

For B

Section 1 - database
Global Database information
Version : Oracle major version
role : database_role
db name : db_unique_name
time [s]: time as of the most recent stats
(hh24:mi:ss)
up [T]: database uptime
ins [N]: total number of instance(s)
sn [c,N]: total user sessions (active/inactive)
us [c,N]: number of distinct users
mt [s,N]: global database memory total (sga+pga)
fra [N]: flashback recovery area %used, (red >
90%)
er [N]: diag active problem count (faults)
% db [s,N]: database time as %(dbtime/cpu) (red if
> 99%)
Section 2 - instance
Top 5 Instance(s) Activity
o Ordered by Database time desc
ID [c,N]: inst_id (instance id)
%CPU [m,N]: host cpu busy %(busy/busy+idle). (red if
> 90%)
LOAD [m,N]: current os load. (red if > 2*#cpu &
high cpu)
%DCU [m,N]: db cpu otusef as %host cpu. (red if >
99% & high AAS)
AAS [s,N]: Average Active Sessions. (red if >
#cpu)
ASC [c,N]: active Sessions on CPU
ASI [c,N]: active Sessions waiting on user I/O
ASW [c,N]: active Sessions Waiting, non-ASI (red if
> ASC+ASI)
ASP [m,N]: active parallel sessions (F/G)
AST [c,N]: Active user Sessions Total (ASC+ASI+ASW)
UST [c,N]: user Sessions Total (ACT/INA)
MBPS [m,N]: i/o megabytes per second (throughput)
IOPS [m,N]: i/o requests per second
IORL [m,T]: avg synchronous single-block read latency.
(red > 20ms)
LOGR [s,N]: logical reads per sec
PHYR [s,N]: physical reads per sec)
PHYW [s,N]: physical writes per sec
%FR [s,N]: shared pool free %
PGA [s,N]: total pga allocated
TEMP [s,N]: temp space used
UTPS [s,N]: user transactions per sec
UCPS [c,m,N]: user calls per sec
SSRT [c,m,T]: sql service response time (T/call)
DCTR [m,N]: database cpu time ratio
DWTR [m,N]: database wait time ratio. (red if > 50
& high ASW)
%DBT [s,N]: instance %Database Time (e.g. non-rac
shows 100%)
Section 3 - db wait events
Top 5 Timed Events
o Cluster-wide, non-idle
o Ordered by wait time desc
EVENT : wait event name. (red if active)
(RT) : Real-Time mode
WAITS : total waits
TIME(s) : total wait time in seconds)
AVG_MS : average wait time in milliseconds
PCT : percent of wait time (all events)
WAIT_CLASS : name of the wait class
Section 4 - process
o Non-Idle processes
o Ordered by event wait time desc
ID [N]: inst_id. (red if blocking)
SID [N]: session identifier. (red if blocking)
SPID [N]: server process os id
USERNAME : Oracle user name
PROGRAM : process program name
SRV : SERVER (dedicated, shared, etc.)
SERVICE : db service_name
PGA [N]: pga_used_mem. (red if continuously
growing)
SQL_ID/BLOCKER : sql_id or the final blocker's (inst:sid,
in red)
OPN : operation name, e.g. select
E/T [T]: session elapsed time (active/inactive)
STA : ACTive|INActive|KILled|CAChed|SNIped
STE : process state, e.g. on CPU or user I/O or
WAIting
WAIT_CLASS : wait_class for the named event
EVENT/*LATCH : session wait event name. Auto toggle with
*latch name.
(red if process is hung/spin)
W/T [T]: event wait time. (red if > 1s)

3.TOP output during activity


For A:-
We had 4 CPU,so our 6 jobs (status :-S means Sleeping) had to
wait for CPU.And other 4 jobs are running (Status :-R means
Running). Also note 86% CPU used because we are consuming 4
CPU out of 4 CPU).There is almost no idle CPU.

For B:-

We had 20 CPU,so our all 10 jobs are running (Status :-R means
Running).Also note 44.4 % CPU is used because out of 20 CPU we
are using 10 CPU.Hence there is 50% idle CPU.
4.VMSTAT report
For A:-

For B:-
5.The table size after 10
jobs completed.
SQL> select bytes/1024/1024/1024 from dba_segments where
segment_name=’T’;

BYTES/1024/1024/1024

——————-

7.375

6.AWR report 360 degree


analysis
A.Header Section
For A:-
For B:-

Same load completed 5 times faster when we have more CPU.

Sessions:=No of active sessions

Cursors/Session:=No of open cursor per session.If this


value is increasing or high indicates potential problem
of cursor leaking.

Elapsed :=The elapsed time indicates the duration of


the report between the 2 selected snapshots. Any other
duration figures can be compared back to this. A 30-60
minute reporting period is usually recommended.

DB time:=db time is the time spent in the database.For


example,If we have 10 parallel session each taking 20
minutes to execute,our DB time will be 200
minutes+other query execution time in 20 minutes
snapshot period.The DB Time is the time spent in the
database for the period of the report. If this is
significantly higher than the Elapsed time then this is
a good indicator of a heavily loaded system.

B.Load Profile
For A:-

For B:-
DB Time:-No of active sessions average during snapshot
period.This is derived by DB time/Elapsed time in
previous section.If the number his high,then probably
many active session are there in database on particular
point which may indicate bottleneck or opportunity to
deep analysis.

DB Cpu:-DB CPU is quite less than DB time ,so sessions


are waiting not much working on cpu.This may indicate a
problem if your DB cpu percent is quite less than DB
time.

Redo size:-11 MB per second.An increase in Redo size


and Block changes indicate increased
DML(INSERT/UPDATE/DELETE) activity.
Logical reads:-Logical reads is simply the number of
blocks read by the database, including physical (i.e.
disk) reads, and block changes is fairly self-
descriptive.If you find those number higher than
expected (based on usual numbers for this database,
current application workload etc.), then you can drill
down to the “SQL by logical reads” and “SQL by physical
reads” to see if you can identify specific SQL
responsible.

Block Changes:-This indicates high block change means


many transactions going on.

User Calls:-In particular, when the database is


executing many times per a user call, this could be an
indication of excessive context switching (e.g. a
PL/SQL function in a SQL statement called too often
because of a bad plan). In such cases looking into “SQL
ordered by executions” will be the logical next step.

Hard parse:-A hard parse occurs when a SQL statement is


executed and is not currently in the shared pool.A hard
parse rate greater than 100 per second could indicate
that bind variables are not being used effectively; the
CURSOR_SHARING initialization parameter should be used;
or you have a shared pool–sizing problem.

Parse:-This is soft parse+hard parse.A soft parse


occurs when a SQL statement is executed and it is
currently in the shared pool. A very high soft parse
rate could lead to problem of programmed application.

Logons:-Establishing a new database connection is also


expensive (and even more expensive in case of audit or
triggers). “Logon storms” are known to create very
serious problem.In fact log off is more expensive.This
indicates problem of not using proper connection
pooling.

Executes:-High no of executes always indicates


potential load to database.

Transactions:-High number of transactions indicates it


is OLTP system.

C.Time Model Statistics


for A:

For B:
sql execute elapsed time is 60%.That means DB spent 60% time of DB_TIME is executing
SQL query
which is OK for now .
DB CPU is used 20% of DB_TIME.
Now We can see PL/SQL execution elapsed time is 11% means many PL/SQL procedure is
being executed.
You can see which PL/SQL executed in SQL ordered by Elapsed Time.
Parse time elapsed and hard parse elapsed time is less which is good sign.If there
are bigger,we
need to check why query is taking more time to parse.

D.Instance Efficiency Percentages


For A:-

For B:-
Buffer Nowait %:-Less than 99 percent may indicate a
problem . This value is the ratio of hits on a request
for a specific buffer where the buffer was immediately
available in memory. If the ratio is low, then there
are (hot) blocks being contented for that should be
found in the Buffer Wait section.

Buffer Hit %:- Less than 95 percent may indicate a


problem. This value is the ratio of hits on a request
for a specific buffer when the buffer was in memory and
no physical I/O was needed.A hit ratio that is steadily
at 95 percent and then drops to 45 percent should be
checked for bad SQL or a dropped index (check the top
physical reads SQL) causing a surge in physical reads
that are not using an index or an index that has been
dropped

Library Hit % :-Less than 95 percent indicates problem.


A lower library hit ratio usually indicates that SQL is
being pushed out of the shared pool early (could be due
to a shared pool that is too small). A lower ratio
could also indicate that bind variables are not used or
some other issue is causing SQL not to be reused (in
which case, a smaller shared pool may only be a bandage
that potentially fixes a resulting library latch
problem)

In-memory Sort % :- less than 95 percent in OLTP


indicates problem. In an OLTP system, you really don’t
want to do disk sorts. Setting the MEMORY_TARGET or
PGA_AGGREGATE_TARGET (or SORT_AREA_SIZE in previous
versions) initialization parameter effectively
eliminates this Problem.

Soft Parse %:- Less than 95 percent indicates problem.


A soft parse ratio that is less than 80 percent
indicates that SQL is not being reused and needs to be
investigated.

Latch Hit %:- Less than 99 percent is usually a big


problem. Finding the specific latch will lead you to
solving this issue.

E.Foreground Events by total wait


time
For A:-

For B:-
This is one of the most important section.You must be
concerned if any wait event take abnormally high %DB
time apart for DB CPU.If DB CPU is high like 80%,then
your application is CPU bound.

DB CPU :-It is showing 19.5% means it is may be CPU


bound but because we have only 4 CPU and 10 parallel
sessions,CPU is not able to do much work.
Resmgr:cpu quantum:-Indicates it may be problem of allocating sufficient number of
CPU cycles.

F.Background wait events


For A:-
For B:-

The log file parallel write shows LGWR is waiting for blocks to be written to all
online redo log members in one group. LGWR will wait until all blocks have been
written to all members. So here we had 80% of background total time spent on log
file parallel write. The parameter of interest here is Avg wait (ms). In our case
it is 1ms which is a perfectly respectable figure. Obviously larger average wait
times point to slower I/O subsystem or poor I/O configurations.As a rule of thumb,
an average time for ‘log file parallel write’ over 20 milliseconds suggests a
problem with IO subsystem.
Another thing is log file parallel write is background wait event of log file
sync.You need to check if large commit
is happening in “user commits” section of “Instance Activity Stats”

G.Host CPU
For A:-

For B:-

Host CPU
Here you can notice %user is 84%.This really gives you
sign that your CPU are highly used.Load average 2.77
(after execution completed) means you are having
average 3 sessions running at one time and other 7
sessions are waiting for CPU.But most of the time load
average was around 10 (refer oratop and vmstat) as
there was 10 sessions running all the time during
snapshot period.

Instance CPU
%Total cpu indicates that Database is using 82.6 % cpu
of total CPU of the database server. This indicates
there are not many other database’s process running
currently.If you see less value here it may indicates
the database server total cpu may be used by other
application or database instances.

H.IO Profile
For A:-

For B:-
Here Total requests:=Database requests + Redo
requests=1046+322=1381(around) which is actually IOPS.

Total MB=Database (MB) + Redo (MB)=5.9+22.9=30.4 MB


which denotes database is generating 30 MB data per
second.

I.Memory Statistics
For A:-
For B:-

Host Mem(MB):-Total memory in the system

SGA use (MB):-Total SGA used

PGA use(MB):-Total PGA used.

Cache size mainly distributed Buffer cache(where data


buffer stored in memory)+Shared pool size.

A low value for the % SQL with exec>1 statistic


indicates databasee is not re-using shared SQL
statements, usually because the SQL is not using bind
variables.
J.OS Statistics
For A:-

For B:-

This is like vmstat report.Almost 83% cpu was used


during execution which is quite high.%busy high is good
means cpu are not using simultaneous multi threading.

You might also like