You are on page 1of 7

Monitoring hardware with UNIX tools

Mar 7, 2001 Donald Burleson © 2001 TechRepublic, Inc.

If an Oracle database server has a hardware problem, such as an overloaded CPU, excessive memory swapping, or disk I/O bottleneck, no amount of tuning within the Oracle database itself is going to improve performance. So the first thing an Oracle professional must examine during tuning is the server's hardware. Because Oracle databases run on dozens of different platforms, from mainframes to personal computers, it's very difficult to give guidelines for tuning every type of server environment. However, since more than 85 percent of Oracle databases run in the UNIX environment, we're going to discuss some of the UNIX tools for determining whether a UNIX database server is undergoing stress. The vmstat utility One of the most common tools used in monitoring hardware for an Oracle database is the UNIX vmstat utility. This utility takes an elapsed-time snapshot of what's going on in the server environment. By issuing the vmstat command, we can take a look at the run queue for the CPU, the memory page and page out of activity, and the activity within each CPU on the database server. Some UNIX system administrators and Oracle DBAs have figured out methods for capturing vmstat output and placing it in the Oracle tables for more detailed analysis. But in order to conduct meaningful analysis, you must know what to look for. It's important to recognize that in each of the different UNIX flavors (Solaris, AIX, HP/UX), vmstat has different display formats. Although the vmstat output is different depending on the flavor of UNIX, all flavors share some important server metrics. In Table 1 below, the important metrics appear in bold.

Table 1

us²User CPU: Shows the amount of CPU used by the user state. Any nonzero value for pi indicates excessive swap activity. . wa²wait (AIX only): Shows the percentage of CPU that is waiting on external operations such as disk I/O. pi²page-in: Occurs when the server is experiencing a shortage of RAM. But despite these differences. id²idle: Shows the percentage of CPU that is idle.As you can see. only a small number of metrics is important for server monitoring. sy: Shows the percentage of CPU being used to service system tasks. These include: y y y y y y r²runque: Shows the number of tasks waiting for CPU resources. a CPU bottleneck exists. each UNIX flavor of vmstat reports provides different information about the current status of the server. When this number exceeds the number of CPUs on the server.

all of the CPU values (us+sy+id+wa) will always sum to 100. Hence. In the UNIX environment.The use of RAM on the database server is also very important.Note that the CPU metrics are expressed as percentages. In any virtual memory environment. . Hence. any run queue value in excess of 8 would indicate that the CPU is being overloaded and tasks are waiting for service. The popular Ion tool is the easiest way to analyze Oracle OS parameter performance and Ion allows you to spot hidden OS related performance trends. the memory contents must be read in from the swap disk. This page-in activity is very time-consuming. Let's examine each of the important measures of server hardware performance. Whenever page-ins occur. for an Oracle database server with eight CPUs (8-way SMP). Ion is our favorite Oracle tuning tool. it is not uncommon to see RAM pages moved out to a swap disk. While page-out of operations are a normal part of any server's operations. Run queue (r) . This is a special disk area in UNIX that's reserved for holding memory pages so that the processor is capable of addressing RAM in excess of its full capability.The run queue values are usually the very first values displayed on the left of all vmstat outputs. pagein operations indicate that the real amount of available RAM has been exceeded and that additional RAM pages are required on the server. and memory page-in operations can cause significant slowdowns of any Oracle database. it is important that the run queue values never exceed the number of CPUs on the processor. RAM page-in (pi) . For example. the run queue is used to display the number of active tasks that are currently waiting for CPU resources.

HP/UX. allow the DBA to get a general idea that an I/O process is slowing down the system. In today's disk architectures. The contention is seen in the demand for the read-write heads on the disk. waiting for other resources on the server. we commonly find that disk striping is used to alleviate these types of disk I/O bottlenecks. there's a wait (wa) column that indicates the percentage of CPU that is in the wait-state. frequently referenced area of disk is constantly being recalled. causing the readwrite heads on the disk device to have a significant impact on the overall performance of the database. and Linux. cylinder by cylinder.Disk bottlenecks Some versions of vmstat. Without disk array caching tools such as EMC disk caching. and in the significant slowdown of the system during hot backups. However. This type of I/O bottleneck can also commonly be seen in cases where the DBA is running hot backups while the database is active. The Oracle hot backups task is attempting to read the database from disk. while the online database engine is attempting to randomly access information on the disk. it's important to recognize that individual disk devices for Oracle databases are becoming larger all the time. An IBM disk I/O monitor For other UNIX environments. In AIX. it's possible to store an entire Oracle database on a single disk. as they relocate themselves under competing cylinders. Database server trend analysis . such as Solaris. For more information on using the iostat utility with Oracle. The RAID 0+1 method effectively does block-level striping across the disk devices. thereby ensuring that the disk load rises and decreases at the same level. see my article "Tuning Disk I/O in Oracle8" in Oracle Magazine. the UNIX iostat utility is generally used to take a look at the I/O activity on each device that's been hooked up to the Oracle server. the Oracle DBA might find the small. Today it's not uncommon to see an individual disk with 73 gigabytes of storage. In such an environment. such as IBM's AIX environment.

It is clear that this database is experiencing periods of high activity each day at 5:00 A. such as the one shown in Figure B. and at 11:00 P. In the example in Figure A. Plotting database server statistics by hour of the day We can also plot abnormal conditions such as page-in operations. we see a plot of CPU and CPU waits. averaged by the hour of the day. These server statistics are very useful for the Oracle DBA because they can show periods where the server is overloaded. Alert reports.Many UNIX system administrators have utilities that can capture and plot the server stress over a period of time. . can be created and e-mailed to the DBA to show when the Oracle server is short on memory.M.M.

the Oracle DBA can accurately predict when additional RAM or CPU will be required.This type of page-in information can also be used to predict the overall load on the database server. A predictive report for RAM memory on an Oracle server . By plotting the consumption of memory and CPU and performing a linear regression. One of the jobs of the Oracle DBA is to predict when additional hardware resources are needed for the database.

. and disk I/O bottlenecks. overall understanding of monitoring and tuning the hardware on your Oracle database server. Next week's article will look at tuning within the confines of the Oracle database. The main points are to carefully look for shortages in CPU.Summary This article is intended to give you a high-level. RAM. beginning with an overview of Oracle instance tuning.