Roger Snowden, Center of Expertise, Oracle November 13, 2007

Even though Unix and Linux systems run on Symmetric Multi Processor architectures, concurrent processing is still constrained by the physical reality that only one process can be running on a CPU at any moment. When demand for CPU exceeds capacity, performance is adversely affected. Therefore, it is important for system administrators to understand their CPU usage and plan hardware resources accordingly. This article discusses simple methods to evaluate and analyze current CPU usage. It does not purport to provide full-scale capacity planning methods, nor does it provide guidance for statistical analysis. This article is intended for system administrators, database administrators, and managers who wish to determine CPU utilization on Unix and Linux platforms and to conduct basic capacity planning.

Since it is typical for more processes to be running than exist CPUs, operating systems provide a scheduling mechanism to permit sharing of CPUs. In Unix and Linux systems, this scheduling mechanism switches processes between three states, as illustrated below.

Processes are generally in one of three states at all times: running, suspended (sleeping) and ready-to-run (ready queue, or run queue).

In the running state, a process is actually executing instructions on a CPU. This process is said to be “on CPU”, and continues to run until interrupted by the operating system, or the process voluntarily yields CPU. Interrupts occur for one of several reasons: 1) another process with a higher priority requires CPU; 2) another process with similar priority requires CPU and the scheduler is allocating CPU time by a round-robin or first-in-first-out time sharing algorithm. When a process leaves the execution state, it enters either the suspended state or the ready-to-run state, as described below.

This explains why surges in I/O on a system often result in dramatically reduced CPU utilization as processes are sleeping. READY-TO-RUN Processes ready to run are entered in the run queue. it is placed directly from the CPU to the run queue. As the name implies. by requesting the operating system wake it up at a predetermined time interval— typical behaviour of Oracle wait events— or it can be placed in Suspended state by the operating system when it makes a request for a resource that is expected to take considerable time. or the predetermined sleep time is complete. many processes competing for CPU time will be in the run queue concurrently. . and does not enter the suspended state. notes the system time as it wakes up to continue execution. between SQL parse/execute/fetch cycles. increasing the length of the queue. and are said to be in the ready-to-run state. and CPU resources are in fact plentiful. since only one process at a time can actually be executing on a given CPU. it enters the suspended. Ordinarily the Oracle session notes the system time just before entering a wait event.SUSPENDED When a process must wait for a resource. the Oracle instance itself tracks this and Oracle "knows" it is waiting. either waiting for I/O completion or waiting for a request message from the client. This information gets added to the session and system statistics as “wait time” for the particular resource for which the process was waiting. Wait event information is then reported in such views as v$session and is accumulated in such reporting vehicles as AWR and Statspack. a non-zero run queue length is an indicator of demand for CPU resources in excess of CPU capacity. Contrast this with situations where many processes are competing for I/O and transition from CPU to suspended state prior to entering the run queue. ready-to-run processes require CPU resources. ORACLE WAIT EVENTS In a busy Oracle database server system. The run queue is an ordered structure that allows the operating system kernel to select the next process to be placed on CPU for execution. For example. the Oracle user process is normally shown by the v$session view to be in the "SQL*Net message from client" wait event. Thus. or sleeping state. the run queue may be small or empty. that will take considerable time. such as a disk I/O operation. such as reading disk or awaiting a network socket message. Other than inevitable momentary transitions through the run queue. or is preempted because another process has a higher priority. Each CPU on the system will have its own run queue. When a process has its execution interrupted because of elapsed time-slice. A process can voluntarily enter this state. the process “wakes up” and is placed into the ready-to-run state. When an Oracle process is in a wait event. When many processes are suspended. When the awaited operation is complete. most user (session) processes will typically be in the suspended state. then compares the times to determine how long it waited. not waiting for a CPU to become available. and that user process will be sleeping. during which time it is dormant.

it is typical for some processes to be in the ready-to-run state most of the time.r=0. A non-zero run queue size is not necessarily an indicator of CPU saturation. it is nonetheless clear the two values are essentially the same.tim=489408753167  PARSE #2:c=20000. Notice as the run queue (“r”) values increase. When a system is completely idle. as CPUs become completely busy. So. although not in lockstep. In this case. Although this operating system. The second line was taken from a period when the run queue was 10. The field “c=” shows CPU time consumed by the parse step. so we need more information to determine CPU utilization and saturation. and so does not track run queue time as waiting for CPU.dep=0. Hence. "id" will be 100. available on all Unix and Linux systems.000 microseconds.mis=1. gives us an overall indication of CPU utilization with the % Idle column. For that reason. time spent in run queues is a symptom of demand for CPU in excess of capacity. and the Oracle session is not aware it is in the nonrunning state. rounds its reported CPU time to the nearest centisecond.ORACLE “LOST” TIME Since multiple processes in ready-to-run state at the same time will also all be in the run queue structure at the same time.og=1. the run queue size will suddenly  SQL Trace (extended) fragments The first line was taken during a period when the run queue was measured at we can consider the "r" run queue and "id" % idle columns together. However.p=0.mis=1. since processes transition between the three states normally. taken at two second intervals. Since a large portion of the code execution time was spent in the run queue. "id" will be 0. at any point in time. When significant differences between reported CPU time and elapsed time are observed. Observe the trace fragment sample shown below: PARSE #2:c=20000. while some processes may spend at least some time waiting for CPU. the length of the run queue is an effective measure of CPU demand in excess of capacity. When an Oracle process migrates from CPU to ready-to-run. That is. Essentially. approximately one third of the total elapsed time was spent actually executing parse code.dep=0. so clearly the server was rather CPU In this case. the time spent in the run queue may be insignificant. as reported by the operating system. CPU time consumed by the parse step is still 20. time spent in run queue is generally “lost" time with respect to Oracle statistical metrics. but may not stay in the queue long. this is evidence the process is CPU bound. the Oracle session process has no indicator of entering the non-running state. All times are reported in microseconds. both in the raw trace file as well as the tkprof-formatted output. .e=58484. a large scale and busy production server may well show an average run queue size greater than zero. waiting for an available CPU for execution. tracking run queue sizes is an effective indicator of CPU saturation. The “e=” field represents elapsed time for the parse event. When a system is completely busy.og=1.p=0. Linux. UTILIZATION AND IDLE % The vmstat utility. as tracked by the Oracle process. usually shows as "id". Observe the sample vmstat report. This phenomenon can be seen in Oracle SQL trace files. it can only track the overall system clock time that has elapsed from start of the parse to its completion. the % idle (“id”) values decrease. Essentially. To get a sense of overall CPU utilization versus capacity. so the server was not CPU-bound at all. while nearly two thirds of that time was spent in ready-to-run state. it is likely at least one process will be in the run queue.r=0. but elapsed time as recorded by the session is nearly three times that value. This is typical of any queueing system when a point of saturation is reached.e=19091.

internal waits. In our CPU case.e. as the arrival rate increases.procs ­­­­­­­­­­­memory­­­­­­­­­­ ­­­swap­­ ­­­­­io­­­­ ­­system­­ ­­­­cpu­­­­  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa  1  1   3456  15820  47360 3463828    0    0  3758    26 1487  1200  9 14 37 41  0  1   3456  16396  47324 3464124    0    0  3778    64 1483  1171 15 13 46 27  9  0   3456  16268  47324 3464904    0    0   320    68 1065   616 54 40  4  2  12  0   3456  16348  47324 3464904    0    0  6   932 1152   791 54 45  1  0  8  0   3456  16348  47324 3464904    0    0     4    22 1016   612 58 41  1  0  10  0   3456  16668  47324 3464904    0    0     0    32 1009  1310 33 25 42  0  . this sudden. so does wait time. Once the capacity of the CPU is reached. So. exponential time increase can be observed in disk I/O. pins or enqueues. the horizontal line illustrates “arrivals”. “waiting” for CPU. . the backlog of arrivals grows exponentially. this would be arrival of processes at the run queue. The graph below illustrates congestion in a typical queueing system. i. and thus application response time increases dramatically. such as latches. such that as workload increases. or as a sharp increase in CPU run queue length. at which time response time increases dramatically. In Oracle database servers. . In the graph. . The increase is gradual until the saturation point is reached. response time increases.

As you can see from the middle horizontal dotted line. However. Now. then increasing the workload without increasing CPU capacity will definitely result in increased response time and degraded performance. This implies a remaining capacity. the average utilization is 60%. at regular intervals during this sampling period. (idle is the inverse of utilization). If the system administrator of this server imagines there is actually 40% unused CPU capacity. averages are often used. Consider the vmstat output below: Utilization.VARIANCE VERSUS AVERAGE In measuring usage of a resource. it can also be misleading. Run Queue  100  80  60  40  20  10  13  16  19  22  25  28  1  4  7  0  %  Utilization  Run  Queue  Avg Run Q Time  . the chart shows utilization peaking at nearly 100%. The “id” column was converted into % Utilization by subtracting the idle percent value from 100. "Busy CPU"  100  80  Utilization  60  40  20  13  17  21  25  29  33  37  41  1  5  9  0  Avg %  Util  Run  Queue % Util  Time  This was extrapolated from a vmstat report from a “busy”. considering only variance can be equally problematic. Only by considering variance—peaks and valleys of a data sample—can you get a meaningful picture of what is going on. While an average utilization can be useful in predicting resource needs. "Saturated CPU"  Utilization. under a higher workload: Utilization. versus the capacity of that resource. or “headroom” of 40%. Consider the same server. but not overloaded system.

and retains an archive of the output for a default period of 48 hours. vmstat is universally available and entirely adequate for the level of analysis discussed in this article. These values may be increased in order to obtain and retain more information when evaluating performance. it may be perfectly normal for the run queue to average 4 processes. poor-performance periods. at 100%. Better yet. and to capture baseline information during important cycle-end periods. However.1. and resembles this text: Linux    OSW v2. It is a shell script tool and will run on Unix and Linux servers. At the point of saturation. It operates as a background shell process and runs the native operating system utilities at configurable intervals. The best way to evaluate “normal” performance metrics is to capture information. without variance. during busy periods when performance is acceptable. Many are quite specialized and effective for capacity planning. note the Run Queue line near the bottom of the chart. On some servers. This allows some comparison to be made during heavier. they cannot become busier than 100%. or during pre-production testing in preparation for an upgrade. This is because it is normal for processes to be transitioning through the run queue. consider both average utilization as well as variance. So. the % Utilization line is nearly perfectly straight. under busy conditions. There is no universal rule of thumb to determine what is the “right” run queue size. such as top. when such a bottleneck is reached. it is free of charge. such as vmstat output. When reading the output captured from OSWatcher. consider average and variance of run queue length. and for that server to be performing well. when evaluating CPU utilization.In this case. which shows considerable variance now. that will capture not only vmstat output. OSWatcher. congestion grows exponentially. processes begin to wait for CPU. usually 30 seconds. When a server’s CPUs are saturated with work. in the run queue. but that of other available operating system performance monitoring tools. Oracle recommends customers download and install OSWatcher on all production and test servers that need to be monitored.0    XXXsrv44  zzz ***Mon Oct 29 07:00:37 EDT 2007  procs ­­­­­­­­­­­memory­­­­­­­­­­ ­­­swap­­ ­­­­­io­­­­ ­­system­­ ­­­­cpu­­­­  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa  10  1      0 3262380  49212 591428    0    0    17   366  588   397 69  1 19 10  14  0      0 3262052  49212 591428    0    0     0    12 1056   569 98  2  0  0  7  0      0 3261988  49212 591428    0    0     0     0 1028   508 100 0  0  0  zzz ***Mon Oct 29 07:01:39 EDT 2007  procs ­­­­­­­­­­­memory­­­­­­­­­­ ­­­swap­­ ­­­­­io­­­­ ­­system­­ ­­­­cpu­­­­  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa  10  0      0 3262580  49212 591948    0    0    17   365  587   396 69  1 19 10  15  0      0 3260652  49212 591948    0    0     0   380 1105   591 98  2  0  0  8  2      0 3260652  49212 591948    0    0     0   100 1045   501 100 0  0  0 . As pointed out earlier. the vmstat report contains three lines of output from each sample. as would be evident if the workload is increased beyond this point. OSWatcher is available from Metalink as note 301137. OSWATCHER Oracle Support’s Center of Expertise has developed a script based tool. even if the time spent in the queue is minimal.1. Correspondingly. iostat and mpstat. TOOLS FOR THE JOB All hardware vendors provide tools capable of measuring CPU utilization. Also.

PERL SCRIPT To facilitate easier analysis of OSWatcher vmstat archives.1 at time of development and testing. The second line from the OSWatcher capture is influenced by the startup of vmstat itself.0. simply use the chart wizard and select both the “r” and “id” columns to produce a two-line chart for run queue and percent idle values. The output of the script will look like this: timestamp r b swpd free buff cache si so bi bo in cs us sy id wa  1194631251 0 0 3364 30412 57624 3457100 0 0 0 0 1002 1004 0 0 100 0  1194631311 0 0 3364 27964 57624 3454500 0 0 0 44 1009 985 0 0 99 1  1194631371 0 0 3364 33028 57628 3454756 0 0 0 60 1009 1032 0 1 98 2  1194631431 0 0 3364 26532 57632 3454752 0 0 0 48 1008 1004 0 0 99 1  1194631491 0 0 3364 32980 57636 3454748 0 0 0 48 1010 991 0 0 100 0 To use the file.0. open it in Excel as a “. create another column to the right of the imported table. RedHat EL 2. .6. a perl script is provided. Each line begins with a timestamp integer. If you want to use percent utilization instead. since there is no other simple way of getting that information. Only the third line is accurate and useful for analysis purposes. then use that column of values in your chart. The script will accept a vmstat file in OSWatcher archive format and produce a space-separated file. and compute values for each cell in the column as 100 minus the contents of the “id” column. by OSWatcher. It also captures three lines of output. The fields should line up with column headings. This field is placed in each line to permit calculation of the time interval between samplings by OSWatcher. On the next wizard screen. For questions or comments on the script. however. A tutorial on statistical analysis is beyond this scope of this paper. select “space” as the delimiter. The complete text of the perl script is included as Appendix A. In addition. you can use the data analysis wizard to create statistical descriptions of your data.snowden@oracle. subtracted from the timestamp of the second line will calculate the number of seconds between OSWatcher vmstat lines. Thus. as it adds identifying information at the beginning of each hourly archive file.1 on a Linux machine. label the column “% Util” or something similar. No warranty or representations are made for the script. the timestamp of the first line.txt” file.0. please contact the author directly at roger. suitable for import into a spreadsheet program such as Microsoft Excel. 1970 on most machines. and choose “delimited” from the import wizard. and should also be disregarded. and places a timestamp before each sample. The first line of any vmstat contains cumulative information and is inaccurate for the sample assumed.The format of vmstat is slightly altered by OSWatcher. The first two lines should be ignored.9-42. To produce a line chart.10. other than it worked with OSWatcher 2. with column heading across the top. and only the last of the three lines from each sample regarded for analysis purposes. representing the number of seconds since January 1.

dat  # chop up name.  ##  ##  use Time::Local;  ## build hash table of month name to string  %Month = (  "Jan" => 0.  "May" => 4.  "Jun" => 5. grab date  ($host.  "Oct" => <filename> \n";  print "where <filename> is a vmstat archived file produced by Utility to parse OSWatcher vmstat archives for spreadsheets  ## November 9.APPENDIX A Perl source for osw_vmstat_parse. 3. year.\n";  exit 1;  }  $fileName = $ARGV[0];  # will be in this form: hostname. $date_part) = split("_".oracle.  ## Author: Roger Snowden. 0.  "Apr" => 3. " Usage: \n";  print "perl osw_vmstat_parse. $fileName);  $year = substr($date_part.  "Sep" => 8.1300. 2);  $month = substr($date_part. all rights reserved.  "Jul" => #!/usr/bin/perl ­w  ## osw_vmstat_parse.com_vmstat_07.snowden@oracle. Oracle Support. 2007  ## Copyright (c) 2007 Oracle. $ <filename>  ## where <filename> is the name of an individual OSW vmstat archive file  ## The utility will produce an output file in the form of:  ## osw_vm_yymmdd_hhmm. $dummy.  "Dec" => 11  );  %MetaColumns = ();  %ReverseColumns = ();  # initialize stuff here  $version = "1.0";  $carlage = 0; ## handy counter  $definedMeta = "FALSE";  $lineCount = 0; ## to store overall "good" lines  if (! $ARGV[0])  {  print "\n";  print "  ## History  ## 11/09/2007 rsnowden: initial cut. day.0  ##  ## Usage:  ## perl osw_vmstat_parse.  "Feb" => 1.  "Aug" => 7. hours taken from the time portion of  ## input archive file.txt  ## where yymmdd_hhmm is month.  "Mar" => 2.  "Nov" => 10. CoE roger. 2); .pl  Version ". v 1.11.

 ". $fileName) or die "File ". $fileName. $month. $min. 6. $time. $line);  # grab version of OSW  if ($thisline[1] eq "OSW" )  {  ($platform.txt";  open(INFILE. $wkday. $year);  }  if ($thisline[0] eq "procs") { }; # do nothing. in original order  @columnNames = @thisline;  foreach $column (@thisline)  {  # first the obverse  $MetaColumns{$column} = $colCount;  # then the reverse  $ReverseColumns{$colCount} = $column;  $colCount++;  }  $definedMeta = "TRUE";  }  ## now eat first two lines of vmstat output  if ($thisline[0] =~ /^\d+$/ && $thisline[1] =~ /^\d+$/ )  {  $carlage++;  if ($carlage == 3)  # . $min. $zone. $sec) = split(":". $bucket;  $bucketNumber++; . $host) = @thisline; $platform="";  }  # parse out date/time stamp  if ($thisline[0] eq "zzz" )  {  #zzz ***Mon Oct 29 10:01:24 EDT 2007  ($junk. $month . 3);  ($hours.\n";  ## create output filename from input filename  ##  while ($line = <INFILE>)  {  # first. $version. $Month{$month}. $dummy. $line);  $junk = $zone; ## just to eliminate spurious warnings  $wkday = substr($wkday. $day . 4);  ## osw_vm_yymmdd_hhmm. $mday. build output filename  $fileOut = "osw_vm_" . 2);  $hours = substr($date_part.  {  $bucketNumber = 0;  foreach $bucket (@thisline)  {  $bucketName = $ReverseColumns{$bucketNumber};  push @$bucketName. $time);  $timeseconds = timelocal($sec.$day = substr($date_part.txt  ## while we have the values. " cannot be opened. $hours . $mday.. $year . header thing  if ($thisline[0] eq "r" && $definedMeta eq "FALSE")  # col hdrs  {  $colCount = 0;  ## save off a "list" of columns. chop into array of tokens  @thisline = split(" ".and the number of the counting shall be three.. 9. $hours. $year) = split(" ".

 "> $fileOut") or die "Cannot open ".txt  open (TXTOUT. $fileOut. so we know later  }  }  }  ## knock out a text file of stuff  ## osw_vm_yymmdd_hhmm.}  push @timestamp. " ";  }  print TXTOUT "\n";  ## never forget the linefeed!  $lines++;  ## and bump da pointer  }  close TXTOUT; . "for writing\n";  print TXTOUT "timestamp ";  foreach $col (@columnNames)  {  print TXTOUT $col. " ";  }  print TXTOUT "\n";  $lines = 0;  while ($lines < $lineCount)  {  print TXTOUT $timestamp[$lines]. $timeseconds;  $carlage = 0; ## reset to a value suitable for carl  $lineCount++; ## bump "good" lines. " ";  foreach $col (@columnNames)  {  print TXTOUT $$col[$lines].